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Preface 


The  1991  Complex  Systems  Summer  School  continued  the  traditions  of  its  pre¬ 
decessors — a  wide  array  of  topics  was  discussed,  students  were  by  turns  excited 
and  exciting,  and  the  editors  of  this  volume  of  lectures  were  left  with  the  task  of 
finding  something  different  to  say  about  an  event  that  has  become  almost  stable 
by  its  fourth  year.  The  alternative  we  have  chosen  is  to  be  mercifully  brief  in  this 
preface  to  the  chapters  based  on  the  fourth  summer  school.  VVe  can  start  off  by 
safely  reporting  that  none  of  our  participants  cracked  the  problem  of  providing  a 
completely  satisfying  definition  of  complexity,  though  not  for  want  of  trying. 

As  in  the  previous  volumes,  the  contents  of  this  book  reflect  the  topics  discussed 
in  the  1991  Summer  School.  However,  .some  of  the  lecturers  given  there  do  not  ap¬ 
pear  within:  some  of  those  will  appear  in  next  year’s  proceedings.  For  completeness, 
we  list  here  those  lectures  which  are  "hot  present  within  this  volume:  Chaos  (Predrag 
Cvitanovic),  Statistical  Mechanics  of  Neural  Networks  (Sara  Solla).  The  Ecologj’ 
of  Computation  (Bernardo  Huberman),  Neural  Network  Algorithms  and  Architec¬ 
tures  (John  Denker),  Neural  Basis  of  Vision  in  Insects  (Nicholas  Strausfeld),  and 
Spin  Glass  Approaches  to  Protein  Folding  (Peter  Wolynes). 

Following  an  innovation  begun  last  year,  we  are  pleased  to  include  a  number 
of  contributions  from  the  participants  themselves.  The.se  are  the  result  of  research 
by  individuals  or  working  groups  .set  up  during  the  school.  The  results  are  quite 
impressive. 
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Neural  Network  Models  for  Pattern 
Recognition  and  Associative  Memory 


Tliis  r('vio\v  outlines  some  fundamental  lu'ural  network  modules  lor  .■\.-.,so- 
ciative  memory,  pattern  recognition,  and  category  learnitig.  Included  are 
discussions  of  the  .McCulloch-Pitts  in'uroti,  perceptrons.  adaline  and  mada- 
line,  back  propagation,  the  learning  matrix,  linear  associative  memory,  em¬ 
bedding  fields,  instars  and  outstars.  the  avalanche,  shunting  competitive 
networks,  competitive  learning,  computational  mapping  by  instar/outslar 
families,  adaptive  resonance  theory,  the  cognitron  and  neocognitron.  and 
simulated  annealing.  Adaptive  filter  formalism  provides  a  unified  notation. 
Activation  laws  include  additive ^nd  shunting  equations.  Learning  laws  in¬ 
clude  back-coupled  error  correction.  Hebbian  learning,  and  gated  instar 
and  outstar  equations.  Also  included  are  discussions  of  real-time  and  off¬ 
line  modeling,  stable  and  unstable  coding,  supervised  and  unsupervised 
learning,  and  self-organization. 
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Gail  A.  Carpenter 


1.  INTRODUCTION 

Neural  network  analysis  exists  on  many  different  levels.  At  the  highest  level  (Figure 
1)  we  study  theories,  architectures,  and  hierarchies  for  big  problems  such  as  early 
vision,  speech,  arm  movement,  reinforcement,  and  cognition.  Each  architecture  is 
typically  constructed  from  pieces,  or  modules,  designed  to  solve  parts  of  a  bigger 
problem.  These  pieces  might  be  used,  for  example,  to  associate  pairs  of  patterns 
with  one  another  or  to  sort  a  class  of  patterns  into  various  categories.  In  turn,  for 
every  such  module  there  is  a  bewildering  variety  of  examples,  equations,  simulations, 
theorems,  and  implementations,  studied  under  various  conditions  such  as  fast  or 
slow  input  presentation  rates,  supervised  or  unsupervised  learning,  and  real-time 
or  off-line  dynamics.  These  variations  and  their  applications  are  now  the  subject  of 
hundreds  of  talks  and  papers  each  year.  In  this  review  1  will  focus  on  the  middle 
level,  on  some  of  the  fundamental  neural  network  modules  that  carry  out  associative 
memory,  pattern  recognition,  and  category  learning. 

Even  then  this  is  a  big  subject.  To  help  organize  it  further,  I  will  trace  the 
historical  development  of  the  main  ideas,  grouped  by  theme  rather  than  by  strict 
chronological  order.  But  keep  in  mind  that  there  is  a  much  more  complex  history, 
and  many  more  contributors,  than  you  will  read  about  here.  I  refer  you  to  the 
Bibliography,  in  particular  to  the  collection  of  articles  in  N eurocomputing:  Foun¬ 
dations  of  Research,  edited  by  James  A.  Anderson  and  Edward  Rosenfeld  (MIT 
Press,  Cambridge.  1988). 


2.  THE  McCULLOCH-PITTS  NEURON 

We  would  probably  all  agree  to  begin  with  the  McCulloch-Pitts  neuron  (Figure 
2(a)).  The  McCulloch-Pitts  model  describes  a  neuron  whose  activity  Xj  is  the  sum 
of  inputs  that  arrive  via  weighted  pathways.  The  input  from  a  particular  pathway  is 
an  incoming  signal  S,  multiplied  by  the  weight  Wij  of  that  pathway.  These  weighted 
inputs  are  summed  independently.  Tlie  outgoing  signal  Sj  =  f{xj)  is  typically  a 
nonlinear  function — binary,  sigmoid,  threshold-linear — of  the  activity  Xj  in  that 
cell.  The  McCulloch-Pitts  neuron  can  also  have  a  bieis  term  6j,  which  is  formally 
equivalent  to  the  negative  of  a  threshold  of  the  outgoing  signal  function. 


3.  ADAPTIVE  FILTER  FORMALISM 

A  very  convenient  notation  for  describing  the  McCulloch-Pitts  neuron  is  the  adap¬ 
tive  filter.  It  is  this  notation  that  I  will  here  use  to  translate  models  into 


Neural  Network  Models  for  Pattern  Recognition  and  Associative  Memory 


5 


NEURAL  NETWORKS 


FIGURE  1  Levels  of  neural  network  analysis. 

a  common  language  so  that  we  can  compare  and  contrast  them.  The  elementary 
adaptive  filter  depicted  in  Figure  2(b)  has: 

1.  a  level  Fi  that  registers  an  input  pattern  vector; 

2.  signals  5,  that  pass  through  weighted  pathways;  and 
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3.  a  second  level  Fn  whose  activity  pattern  is  here  computed  by  the  McCulloch- 
Pitts  function; 

Xj  —  ^  S,  u'ij  +  Oj.  ( 1 ) 


(a)  McCULLOCH-PITTS  NEURON 


(b) 


ADAPTIVE  FILTER 


i 

‘  S,  =  f(Xj) 

“'ll’ 

Ml 

r 

^  Sj  =  f(Xj) 

Ml 

k 

I  S||  w.|  cos(  S,  w,  )  INPUT 


FIGURE  2  The 
McCulloch-Pitts  model  (a) 
as  a  neuron,  with  typical 
nonlinear  signal  functions; 
(b)  as  an  adaptive  filter. 
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The  reason  this  formalism  has  proved  so  extraordinarily  useful  is  that  the  Fo 
level  of  the  adaptive  filter  computes  a  pattern  match,  as  in  Eq.  (2). 

=  S  •  Wj  =  jiSl|i|wj||cos(S,Wj).  (2) 


The  independent  sum  of  the  weighted  pathways  in  Eq.  (2)  equals  the  dot  product  of 
the  signal  vector  S  times  the  weight  vector  Wj .  This  term  can  be  factored  into  the 
“energy,"’  the  product  of  the  lengths  of  S  and  Wj ,  times  a  dimensionless  measure 
of  “pattern  match.”  the  cosine  of  the  angle  between  the  two  vectors.  Suppose  that 
the  weight  vectors  vfj  are  normalized  and  the  bias  terms  9j  are  all  equal.  Then  the 
activity  vector  x  across  the  second  level  describes  the  degree  of  match  between  the 
signal  vector  S  and  the  various  weighted  pathway  vectors  Wji  the  node  with  the 
greatest  activity  indicates  the  weight  vector  that  forms  the  best  match. 


4.  LOGICAL  CALCULUS  AND  INVARIANT  PATTERNS 

The  paper  that  first  descrilTes  the  McCulloch-Pitts  model  is  entitled  “A  Logical  Cal¬ 
culus  of  the  Ideas  Immanent  in  Nervous  Activity.”^®  In  that  paper,  McCulloch  and 
Pitts  analyze  the  adaptive  filter  without  adaptation.  In  their  models,  the  weights 
are  constant.  There  is  no  learning.  This  1943  paper  shows  that  given  the  linear 
filter  with  an  absolute  inhibition  term; 

Xj  =  -b  9j  —  [inhibition]  (3) 


and  binary  output  signals,  these  networks  can  be  configured  to  perform  arbitrary 
logical  functions.  And  if  you  are  looking  for  applications  of  neural  network  research, 
you  need  only  read  the  memoirs  of  John  von  Neumann'*'  to  see  how  heavily  the 
McCulloch-Pitts  formalism  influenced  the  development  of  present-day  computer 
architectures. 

In  a  sense,  however,  McCulloch  and  Pitts  were  looking  backwards,  to  the  early 
20th  century  mathematics  of  Pnncipia  Mathematical^  A  glance  at  the  1943  paper 
shows  that  it  is  written  in  notation  with  which  few  of  us  are  now  familiar.  (This  is  a 
good  example  of  revolutionary  ideas  being  expressed  in  the  language  of  a  previous 
era.  As  the  revolution  comes  about  a  new  language  evolves,  making  the  seminal 
papers  “hard  to  read.”)  McCulloch  and  Pitts  also  clearly  looked  forward  toward 
present-day  neural  network  research.  For  example,  a  later  paper  is  eiiiitled  “How 
We  Know  Universals:  The  Perception  of  Auditory  and  Visual  Forms. There 
they  examine  ideas  in  pattern  recognition  and  the  computation  of  invariants.  They 
thus  took  their  research  program  into  a  domain  distinctly  different  from  the  eeirlier 
analysis  of  formal  network  groupings  and  computation.  Still,  they  considered  only 
models  without  learning. 
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5.  PERCEPTRONS  AND  BACK-COUPLED  ERROR 
CORRECTION 

The  McCulloch-Pitts  papers  were  extraordinarily  influential,  and  it  was  not  long 
before  the  next  generation  of  researchers  added  learning  and  adaptation.  One  great 
figure  of  the  next  decade  was  Frank  Rosenblatt,  whose  name  is  tied  with  the  percep- 
tron  model. .\ctually,  'perceptron  '  refers  to  a  large  class  of  neural  models.  The 
models  that  Rosenblatt  himself  developed  and  studied  are  numerous  and  varied: 
see,  for  example,  his  book.  Principles  of  Neurodynamicsf^ 

The  core  idea  of  the  perceptron  is  the  incorporation  of  learning  into  the  McCul- 
loch-Pitts  neuron  model.  Figure  illustrates  the  main  elements  of  the  perceptron, 
including,  in  Rosenblatt’s  terminology,  the  sensory  unit  (S);  the  association  unit 
(A),  where  the  learning  takes  place;  and  the  response  unit  (R). 

One  of  the  many  perceptrons  that  Rosenblatt  studied,  one  that  remains  im¬ 
portant  to  the  present  day,  is  the  back-coupled  perceptronf^  Figure  4(a)  illustrates 
a  simple  version  of  the  back-coupled  perceptron  model,  with  a  feedforward  adaptive 


McCULLOCH-PITTS  +  LEARNING 


RESPONSE 
UNIT  (R) 


ASSOCIATION 
UNIT  (A) 


SENSORY 
UNIT  (S) 


f  (X;) 


-1 


dw 


I) 


dt 


FIGURE  3  Principal  elements  of  a  Rosenblatt  perceptron:  sensory  unit  (S),  association 
unit  (A),  and  response  unit  (R). 
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filter  and  binary  output  signal.  Weights  Wij  are  adapted  according  to  whether  the 
actual  output  Sj  matches  a  target  output  hj  imposed  on  the  system.  The  actual 
output  vector  is  subtracted  from  the  target  output  vector:  their  difference  is  defined 
as  the  error;  and  that  difference  is  then  fed  back  to  adjust  the  weights,  according  to 
some  probabilistic  law.  Rosenblatt  called  this  process  back-coupled  error  correction. 
It  was  well  known  at  the  time  that  these  two-level  perceptrons  could  sort  linearly 
separable  inputs,  which  can  be  separated  by  a  hyperplane  in  vector  space,  into  two 
classes.  Figure  lib)  shows  back-coupled  error  correction  in  more  detail.  In  particular 
the  error  6j  is  fed  back  to  every  one  of  the  weights  converging  on  the  _/th  node. 


6.  AOALINE  AND  MADALINE 

Research  in  the  1960s  did  not  stop  with  these  two-level  perceptrons,  but  contin¬ 
ued  on  to  multiple-level  perceptrons,  as  indicated  below.  But  first  let  us  consider 
another  development  that  took  place  shortly  after  Rosenblatt’s  perceptron  formu¬ 
lations.  This  is  the  set  of  models  used  by  Bernard  Widrow  and  his  colleagues, 
especially  the  adaline  and  madaltne  perceptrons.  The  adaline  model  has  just  one 
neuron  in  the  Fo  level  in  FTgure  5;  the  madaline,  or  many-adaline,  model  has  any 
number  of  neurons  in  that  level.  Figure  5  highlights  the  principal  difference  be¬ 
tween  the  adaline/madaline  and  Rosenblatt’s  two-level  feedforward  perceptron:  an 
adaline/madaline  model  compares  the  analog  output  Xj  with  the  target  output  bj. 
This  comparison  provides  a  more  subtle  index  of  error  than  a  law  that  compares 
the  binary  output  with  the  target  output.  The  error  6j  —  Xj  —  Sj  is  fed  back  to 
adjust  weights  using  a  Rosenblatt  back-coupled  error  correction  rule; 


dwjj 

dt 


—  aSj 


This  rule  minimizes  the  mean  squared  error: 


(4) 


(5) 


averaged  over  all  inputs. It  is  therefore  known  as  the  least  mean  squared  error 
correction  rule,  or  LMS. 
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(^)  BACK  -  COUPLED  PERCEPTRON 


(b)  BACK  -  COUPLED  ERROR  CORRECTION 


FIGURE  4  Back-coupled 
error  correction,  (a)  The 
difference  between  the 
target  output  and  the 
actual  output  is  fed  back 
to  adjust  weights  when 
an  error  occurs,  (b)  All 
weights  Wij  fanning  in  to 
the  jXh  node  are  adjusted 
in  proportion  to  the  error  dj 
at  that  node. 
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ADALINE  /  MADALINE 

(  1  NEURON  )  /  (  MANY  NEURONS  ) 

i 

I -  - ^ - 

I  TARGET  OUTPUT  I  ACTUAL  OUTPUT 

i  ;  ;  .BINARY! 


FIGURE  5  The  adaline  and  madaline  perceptrons  use  the  analog  output  r,,  rather 
than  the  binary  output  Sj ,  in  the  back-dbupled  error  correction  procedure. 

Once  again,  adaliin'  and  tiiadaliiu'  provide  many  <  xample.s  of  the  leclmologitMl 
spin-offs  already  generat«'d  by  iKMiral  network  research.  .Somi'  of  flic'^e  an*  summ.i 
rizt'd  in  an  articit'  by  VVidrow  and  W'init'r  ''  in  a  (  'or/ipitfer  sfx'cial  issue  on  artifici.il 
neural  systems.  I'liere  llie  authors  descril)e  ada[)tiv('  eipializers  and  .adaptive  I'cli" 
canci'llation  in  modems,  antennae,  and  other  enginet'ring  .applical ioti.>.  .all  dire(  il\ 
tracf'able  to  early  tieiiral  netwttrk  designs. 
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7.  MULTI-LEVEL  PERCEPTRONS:  EARLY  BACK 
PROPAGATION 

VVe  have  so  far  hecMi  discussing  only  Hvo-lcvid  [jerceptrons.  Uoscniilai  t .  not  con¬ 
tent  with  tliese,  also  studied  luiilt i-level  perreptrons.  .ls  described  in  Pniutphs  at 
Xcurodynarmrs.  One  particularly  inter<'st ing  si-ct ion  in  that  boiik  is  entitled  'Ib-K  k- 
Propagating  Error  ( 'orrection  Proct'd tin's. "  The  liack-ftropagat ion  model  described 
in  that  section  anticipates  the  currently  n.sed  back-propagation  tnoih'l.  which  is 
also  a  mnlti-lev<d  pt'rceptron.  In  Chapter  Id.  Rosenblatt  defines  a  b,irk-[)ropagat loti 
algorithm  that  has,  like  tnost  of  his  algorithms,  a  probabilistic  learning  law;  he 
proves  a  thf'on'rn  about  this  system;  and  he  carrit's  out  sitmilat ions,  llis  chaptt  r. 
"Summary  of  Three- I.ayer  Series-Coupled  .SysteriLs:  Capabilities  and  neficiencies. 
is  etpially  rc'vi'aiing.  This  chapter  includes  a  hard  look  at  what  is  lacking  as  wi'll  as 
what  is  good  in  Rosenblatt’s  back-propagat loti  algorttlirn.  atid  it  puts  tin'  lie  to  the 
myth  that  all  of  the.se  systems  were  looki'd  at  only  through  rose-colored  glas.ses. 


8.  LATER  BACK  PROPAGATION 

Let  us  now  tnov<'  on  to  what  has  become  one  of  the  most  useful  and  well-studied 
neural  network  algorithms,  the  model  we  now  call  back  propagation.  This  system 
was  first  developed  by  Paul  Werbos,^'*  as  part  of  his  Ph  D  tht'sis  "Beyond  Re¬ 
gression:  New  Tools  for  Prediction  and  Analysis  in  the  Behavioral  Sci('nre.s";  and 
independently  discovered  by  David  Parker. (.See  Werbos’*  for  a  review  of  the 
history  of  the  development  of  back  propagation  ) 

The  most  popular  back-propagation  examples  carry  out  associative  leartiing: 
during  training,  a  vector  pattern  a  is  associated  with  a  vector  [lattern  b:  and  sub¬ 
sequently  b  is  recalled  upon  presentation  of  a.  "  The  back-propagation  system  is 
trained  under  conditions  of  slow  learning,  with  each  pattern  pair  (a.b)  presentt'd 
repeatedly  during  training.  The  b«isic  elements  of  a  typical  back-propagation  sys¬ 
tem  are  the  .VIcCulloch-Pitts  linear  filter  with  a  sigmoid  output  signal  function 
and  Ro.senblatt  back-coupled  error  correction.  Figure  6  shows  a  block  diagram  of 
a  back-propagation  system  that  is  a  three-level  perceptron.  The  input  signal  vec¬ 
tor  converges  on  the  “hidden  unit"  F2  level  after  passing  through  the  first  set  of 
weighted  pathways  u/,y.  Signals  Sj  then  fan  out  to  the  Fa  level,  which  geni'rates 
the  actual  output  of  this  feedforward  system.  A  back-coupled  I'rror  correction  sys¬ 
tem  then  compares  the  actual  output  Sk  with  a  target  output  hk  and  fei'ds  back 
their  difference  to  all  the  weights  xvjk  converging  on  the  A-th  node.  In  this  pro¬ 
cess  the  difference  bk  —  Sk  is  also  multiplied  by  another  term,  f'(xk).  computed 
in  a  “differentiator"  step.  One  function  of  this  step  is  to  ensure  that  th<'  weights 
remain  in  a  bounded  range:  the  shape  of  the  sigmoid  signal  function  implies  that 
weights  Wjk  will  stop  growing  if  the  magnitude  of  the  activity  Xk  becomes  too  largi'. 
since  then  the  derivative  term  f'{xk)  goes  to  zero.  Then  there  is  a  .second  way  in 
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which  the  error  correction  is  fed  back  to  the  lower  level.  This  is  where  the  term 
“back  propagation"  enters:  the  weights  in  the  feedforward  pathways  from 
to  /"s  are  now  used  in  a  second  place,  to  lilter  error  information.  This  process  is 
called  weight  transport.  In  particular,  all  the  weights  in  pathways  fanning  out 
from  the  Jth  node  an'  transpornd  for  multiplication  by  the  corresponding  error 


FIGURE  6  Block  diagram  of  a  back-propagation  algorithm  for  associative  memory. 
Weights  in  the  three-level  feedforward  perceptron  are  adjusted  according  to  back- 
coupled  error  correction  rules.  Weight  transport  propagates  error  information  in  F2-I0-F3 
pathways  back  to  weights  in  F^  -to-Z'o  pathways. 


14 


Gail  A.  Carpenter 


tt'rms  r**.:  and  the  sum  of  all  these  products,  times  the  bounding  derivative  term 
f'i-i'k-)-  is  back-coupled  to  adjust  all  the  weiglits  w,j  in  pathways  fanning  in  to  the 
_;th  F-2  node. 


9.  HEBBIAN  LEARNING 

Ibis  brings  us  close  to  the  pri'sent  in  this  [larticiilar  line  of  perceptroii  re.search.  I 
am  now  going  to  step  back  and  trace  another  major  neural  network  theme  that  goes 
under  tlie  name  llthbian  learning.  Otu'  sentenci'  in  a  I919  book.  The  Organization 
of  Behavior  by  Donahl  llebb.  is  responsible  for  the  phrase  Hebbian  learning; 

■'When  an  a.\on  ot  ceil  A  is  ni'ar  enough  t(^  e.xcite  a  cidl  B  and  repeatedly 
or  [lorsistently  takes  place  in  tiring  it.  .some  growth  process  or  metabolic 
change  takes  place  in  one  or  both  cells  such  that  A  s  efficiency,  as  one  of 
the  cells  firing  R.  is  increased.'"'^ 


HEBBIAN  LEARNING 


PRESYNAPTIC  / 

^  POSTSYNAPTIC 
^  CORRELATION 


=  aS.  X.  >  0 


FIGURE  7  Donald  Hebb*"*  provided  a  qualitative  description  of  increases  in  path 
strength  that  occur  when  cell  A  helps  to  fire  cell  B.  In  the  adaptive  filter  formalism, 
this  hypothesis  is  often  interpreted  as  a  weight  change  that  occurs  when  a  presynaptic 
signal  5,  is  correlated  with  a  postsynaptic  activity  Xj. 
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Actually,  "Hebbian  l<>arning  was  not  a  new  idea  in  1949:  it  can  be  traced  back 
to  Pavlov  and  earlier.  But  in  the  decade  of  McCulloch  and  Pitts,  the  formulation 
of  the  idea  in  the  above  sentenc(>  crystallized  the  notion  in  such  a  way  that  it 
became  widely  influential  in  the  eitierging  neural  network  field.  Translated  into  a 
differential  equation  (Figure  7),  the  Hebbian  rule  computes  a  correlation  between 
the  presynaptic  signal  and  the  postsynaptic  activity  with  positive  values  of 
the  correlation  term  S,Xj  leading  to  increases  in  the  weight  u,j. 

The  Hebbian  learning  theme  has  since  evolved  in  a  number  of  directions.  One 
impcjrtant  development  entailed  simply  adding  a  passive  dt’cay  term  to  the  Hebbian 
correlation  term''*: 

=  o.-s.jy  -  ir,j  (b) 

Other  developments  are  d<'scribed  below.  In  all  these  rules,  changes  in  the  weight 
IV, j  (h'pf'tid  upon  a  simple  function  of  the  presynaptic  signal  S,.  the  postsynaptic 
activity  and  the  weight  itself,  as  m  Kq.  (fi).  In  contrast,  back-coupled  error 
correction  requires  a  ti'rm  that  must  be  computed  away  from  the  target  node  and 
then  transmitted  back  to  adjttst  the  weight. 


10.  THE  LEARNING  MATRIX 

Many  of  the  models  that  followed  the  piTci'ptron  in  the  1950s  and  1960s  can  be 
phrased  in  Hebbian  (plus  .Mc( 'iilloch-Pitts)  language.  Oni'  of  the  earliest  and  most 
important  is  the  learning  matri.x  (Figuri'  S)  developed  by  K.  Steinbuch The 
function  of  the  learning  matrix  is  to  sort,  or  partition,  a  set  of  vector  patterns 
into  categories.  In  the  simple  learning  matrix  illustrated  in  Figure  8(a),  an  input 
pattern  a  is  repre.sented  in  the  vertical  wires.  During  learning  a  category  for  a  is 
represented  in  the  horizontal  wires  of  the  crossbar:  a  is  placed  in  category  .1  when 
the  ./th  component  of  the  output  vector  b  is  .set  equal  to  1.  During  such  an  input 
presentation,  the  weight  u’,j  is  adjusted  upward  by  a  fixed  amount  if  a,  =  1  and 
downward  by  the  same  amount  if  «,  :=wl).  Then  during  performance  the  weights 
are  held  constant:  and  an  input  a  is  deemeu  to  be  in  category  .1  if  the  weight  vector 
Wj  =  {wij , . .  .  w,\j }  is  closer  than  any  other  weight  vector  to  a.  according  to  some 
measure  of  distance. 

Recasting  the  crossbar  learning  matrix  in  the  adaptive  filter  format  (Figure 
8(b))  helps  us  to  see  that  this  simple  model  is  the  precursor  of  a  fundamental 
module  widely  used  in  present-day  neural  network  modeling,  namely  competitive 
learning.  In  particular,  activity  at  the  top  level  of  the  learning  matrix  corresponds 
to  a  category  representation.  Setting  activity  xj  equal  to  1,  while  all  other  Xj'a 
are  set  equal  to  0.  corresponds  to  the  dynamics  of  a  choice,  or  winner-take-all. 
neural  network.  Steinbuch ’s  learning  rule  can  also  be  translated  into  the  Hebbian 
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FIGURE  8  The  learning 
matrix,  for  category  learning. 

(a)  Cross-bar  architecture  for 
electronic  implementation. 

(b)  The  learning  matrix 
in  adaptive  filter  notation. 

The  learning  matrix  was  a 
precursor  of  the  competitive 
learning  paradigm. 
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formalism,  with  weight  adjustment  during  learning  a  joint  function  of  a  presynaptic 
signal  Si  =  (2a,  —  I)  and  a  postsynaptic  signal  Xj  —  (This  rule  is  not  strictly 
Hebbian  since  weights  can  decrease  as  well  as  increase.)  Then  during  performance, 
weight  changes  are  prevented;  a  new  signal  function  Si  =  a,  is  chosen;  and  an  F2 
choice  rule  is  imposed  based,  for  example,  on  the  dot  product  measure  illustrated 
in  Figure  9(b). 

A  model  comparative  analysis  of  the  learning  matrix  and  the  madaline  models 
and  their  electronic  implementations  can  be  found  in  a  paper  by  K.  Steinbuch  cind 
B.  VVidrow.'^^  This  paper,  entitled  “A  critical  comparison  of  two  kinds  of  adaptive 
classification  networks,”  carries  out  a  side-by-side  analysis  of  the  learning  matrix 
and  the  madaline,  tracing  the  two  models'  capabilities,  similarities,  and  differences. 


11.  LINEAR  ASSOCIATIVE  MEMORY  (LAM) 

We  will  now  move  to  a  different  line  of  research,  namely  the  linear  associative  mem¬ 
ory  (LAM)  models.  Pioneering  work  on  these  models  was  done  by  J.  Anderson,^ 
T.  Kohonen,^°  and  K.  Nakano.^®  Subsequently,  many  other  linear  associative  mem¬ 
ory  models  were  developed  and  analyzed,  for  example  by  Kohonen  and  his  collabo¬ 
rators,  who  studied  LAM’s  with  iteratively  computed  weights  that  converge  to  the 
Moore-Penrose  pseudoinverse.^^  This  latter  system  is  optimal  with  respect  to  the 
LMS  error  (5),  and  so  is  known  as  the  optimal  linear  associative  memory  (OLAM) 
model.  Variations  included  networks  with  partial  connectivity,  probabilistic  learning 
laws,  and  nonlinear  perturbations. 

At  the  heart  of  all  these  variations  is  a  very  simple  idea,  namely  that  a  set  of 
pattern  pairs  can  be  stored  as  a  correlation  weight  matrix: 

=  E  “i'’''’!'”-  ('> 

p  (all  patterns) 

The  LAM’s  have  been  an  enduringly  useful  class  of  models  because,  in  addition 
to  their  great  simplicity,  they  embody  a  sort  of  perfection.  Namely,  perfect  recall 
is  achieved,  provided  the  input  vectors  are  mutually  orthogonal.  In  this  case, 
during  performance,  presentation  of  the  pattern  a^^^  yields  an  output  vector  x 
proportional  to  as  follows; 

Xj  =  a(P>  •  Wj  =  ^  ==  ^ 

‘  '  (8) 

1  »  q 

If,  then,  the  vectors  a^^^  are  mutually  orthogonal,  the  last  sum  in  Eq.  (8)  reduces 
to  a  single  term,  with 


I,  =  llaC'lpl.f. 


(9) 
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Thus  the  output  vector  x  is  directly  proportional  to  the  desired  output  vector, 
Finally,  if  we  once  again  cast  the  LAM  in  the  adaptive  filter  framework,  w’e  see  that 
it  is  a  Hebbian  learning  model  (Figure  9). 


LEARNING  dw., 

(  HEBBIAN  )  _ '1  = 

dt 


Each  pair  ( 


a(p)  jj(p)  j  presented  for  1  time  unit : 


w. . 

'I 


I 

P  '  j 


PERFORMANCE 


X. 

J 


a  .w. . 

I  '1 


dt 


=  0 


FIGURE  9  A  linear  associative  memory  network,  in  adaptive  filter/Hebbian  learning 
format. 
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12.  REAL-TIME  MODELS  AND  EMBEDDING  FIELDS 

Most  of  the  models  we  have  so  far  discussed  require  external  control  of  system  dy¬ 
namics.  In  the  back-propagation  model  shown  in  Figure  6.  for  example,  the  initial 
feedforward  activation  of  the  three-level  perceptron  is  followed  by  error  correction 
steps  that  require  either  weight  transport  or  reversing  the  direction  of  flow  of  acti¬ 
vation.  In  the  linear  ,'issociative  memory  model  in  Figure  9.  dynamics  are  altered  as 
the  system  moves  from  its  h'arnmg  mode  to  its  performance  mode.  During  h'arning. 
activity  Xj  at  the  output  level  F-,  is  set  equal  to  the  desired  output  6^ ,  while  the  in¬ 
put  SiUiiVij  coming  to  that  level  from  f\  through  the  adaptive  filter  is  suppressed. 
During  performance,  in  contrast,  the  dynamics  are  reversed:  weight  changes  are 
suppressed  and  the  adaptive  filter  input  determines  xj. 

The  f-hreise  real  time  de.scribes  neural  network  models  that  require  no  exter¬ 
nal  control  of  system  dynamics.  (Real  time  is  alternatively  used  to  describe  any 
system  that  is  able  to  process  inputs  as  fast  as  they  arrive.)  Differential  (Equations 
constitute  the  language  of  real-time  models.  A  real-time  model  may  or  may  not 
have  an  external  teaching  input,  like  the  vector  b  of  the  LAM  model;  and  learn¬ 
ing  may  or  may  not  be  shut  down  after  a  finite  time  interval.  A  typical  real-time 
model  is  illustrated  in  Figure  10.  There,  excitatory  and  inhibitory  inputs  could  be 
either  internal  or  external  lb  the  model,  but.  if  present,  the  influence  of  a  signal  is 
not  selectively  ignored.  Moreover,  the  learning  rate  f(f)  might,  say,  be  constant  or 
decay  to  0  through  time,  but  does  not  require  algorithmic  control.  The  dynamics  of 
performance  are  described  by  the  same  set  of  equations  as  the  dynamics  of  learning. 

Real-time  modeling  has  characterized  the  work  of  Stephen  Grossberg  over  the 
past  thirty  years,  work  that  in  its  early  stages  was  called  a  theory  of  embedding 
fields. These  early  real-time  models,  as  well  as  the  more  recent  systems  developed 
by  Grossberg  and  his  colleagues  at  the  Boston  University  Center  for  Adaptive 
Systems,  portray  the  inextricable  linking  of  fast  nodal  activation  and  slow  weight 
adaptation.  There  is  no  externally  imposed  distinction  between  a  learning  mode 
and  a  performance  mode. 


13.  INSTARS  AND  OUTSTARS 

Two  key  components  of  embedding  field  systems  are  the  instar^  ‘ and  the 
outsiar.^^  Figure  11  illustrates  the  fan-in  geometry  of  the  instar  and  the  fan-out 
geometry  of  the  outstar. 

Instars  often  appear  in  systems  designed  to  carry  out  adaptive  coding,  or 
content-addressable  memory  (CAM).^*  For  example,  suppose  that  the  incoming 
weight  vector  {wu , . . .  ,wnj)  approaches  the  incoming  signal  vector  (5i,...,5;v) 
while  an  input  vector  a  is  present  at  Fi;  and  that  the  weight  and  signal  vectors 
are  normalized.  Then  Eq.  (2)  implies  that  the  filtered  input  ^2,  SiWij  to  the  Jih 
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ACTIVATION  EQUATION  (ADDITIVE  MODEL  ) 


dx.  _  +  y  [  fixcilatory 

“dr"  j  ^  inputs 


I 


inhibitory 

inputs 


LEARNING  EQUATION 


dw. . 

_ LL=  e(t)  F{  S.,  X.,  w. .) 

dt  •  J  iJ 


FIGURE  1 0  Elements  of  a  typical  real-time  model,  with  additive  activation  equations. 


F2  node  approaches  its  maximum  value  during  learning.  Subsequent  presentation 
of  the  same  Fi  input  pattern  a  maxmally  activates  the  7th  Fz  node;  that  is,  the 
“content  addresses  the  memory,”  all  other  things  being  equal. 

The  outstar,  which  is  dual  to  the  instar,  carries  out  spatial  pattern  learning. 
For  example,  suppose  that  the  outgoing  weight  vector  (wji, . . . ,  wjn)  approaches 
the  Fi  spatial  activity  pattern  (xi, . . . ,  x^v)  while  an  input  vector  a  is  present.  Then 
subsequent  activation  of  the  Jth  F2  node  tremsmits  to  Fi  the  signal  pattern  {Sjwji, 
. . .,  SjWjn)  =  Sj{wji,  ...,  wjs),  which  is  directly  proportional  to  the  prior  Fi 
spatial  activity  pattern  (xi,...,x;v),  even  though  the  input  vector  is  now  absent; 
that  is,  the  “memory  addresses  the  content.” 

The  upper  instar  and  outstar  in  Figure  11  are  examples  of  heieroassociattve 
memories,  where  the  field  Fi  of  nodes  indexed  by  t  is  disjoint  from  the  field  F2 
of  nodes  indexed  by  j.  In  general,  these  fields  can  overlap.  The  important  special 
case  in  which  the  two  fields  coincide  is  called  auioassociaiive  memory,  ako  shown 
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in  Figure  11.  Powerful  computational  properties  arise  when  neural  network  archi¬ 
tectures  are  constructed  from  a  combination  of  instars  and  outstars.  We  will  later 
see  some  of  these  designs. 


INSTAR  (FAN-IN) 
ADAPTIVE  CODING 
CONTENT  -  ADDRESSABLE 


OUTSTAR  (FAN  -  OUT) 
SPATIAL  PATTERN  LEARNING 


INDEX  SETS 


HETEROASSOCIATIVE;  1  n  J  =  0 

AUTO  ASSOCIATIVE.  I  =  J 
(  INSTAR  ==  OUTSTAR  ) 


FIGURE  11  Heteroassociative  and  autoassociative  instars  and  outstars,  for  adaptive 
coding  and  spatial  pattern  learning. 
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14.  ADDITIVE  AND  SHUNTING  ACTIVATION  EQUATIONS 

The  outstar  and  the  instar  have  been  studied  in  great  detail  and  with  various 
combinations  of  activation,  or  short-term  memory,  equations  and  learning,  or  long¬ 
term  memory,  equations.  One  activation  equation,  the  additive  model,  is  illustrated 
in  Figure  10.  There,  activity  at  a  node  is  proportional  to  the  difference  between  the 
net  excitatory  input  and  the  net  inhibitory  input.  Most  of  the  models  discussed  so 
far  employ  a  version  of  the  additive  activation  model.  For  example,  the  Mc(^ulloch- 
Pitts  activation  tMiuation  (.'{)  is  the  steady  state  of  the  additive  equation  ( 10): 


dXj 

'di 


[inhibition  . 


(10) 


(lrossberg“‘*  reviews  a  numlx'r  of  n<'ural  models  that  are  versions  of  the  additive 
equation. 

An  important  generalization  of  the  additive  model  is  the  shunting  model.  In  a 
shunting  network,  excitatory  inputs  drive  activity  toward  a  finite  maximum,  while 
inhibitory  inputs  drive  activity  toward  a  finite  minimum,  as  in  Eq.  (11): 


dxi 


=  -Xi +  (.4  — j', )  ^  jexcitatory  inputs  +  inhibitory  inputs  .  (11) 


In  Eq.  (11),  activity  x;  remains  in  the  bounded  range  (  —  B.A).  and  decays  to  the 
resting  level  0  in  the  absence  of  all  inputs.  In  addition,  shunting  equations  display 
other  crucial  properties  such  as  normalization  and  automatic  gain  control.  Finally, 
shunting  network  equations  mirror  the  underlying  physiology  of  single  nerve  cell 
dynamics,  as  summarized  by  the  IIodgkin-Huxley*'  equations: 

^  =  -V'  +  (V;.Va  - 

at 

In  this  single  nerve  cell  model,  during  depolarization,  sodium  ions  entering  across 
the  membrane  drive  the  potential  V  toward  the  sodium  equilibrium  potential  V\a', 
during  repolarization,  exiting  potassium  ions  drive  the  potential  toward  the  potas¬ 
sium  equilibrium  potential  —  V/c;  and  in  the  balance  the  cell  is  restored  to  its  resting 
potential,  which  is  here  set  equal  to  0.  In  1963  A.  L.  Hodgkin  and  A.  F.  Huxley 
won  the  .Nobel  Prize  for  their  development  of  this  classic  neural  model. 
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15.  LEARNING  EQUATIONS 

A  wide  variety  of  learning  laws  for  instars  and  outstars  have  also  been  studied.  One 
example  is  the  Hebbian  correlation  -f  passive  decay  equat  ion  (6).  There,  the  weight 
Wij  computes  a  long-term  weighted  average  of  the  product  of  presynaptic  activity 
Si  and  postsynaptic  activity  Xj. 

A  typical  learning  law  for  instar  coding  is  given  by  Eq.  ( 13): 

dwi, 

-  (13) 

Suppose,  for  example,  that  the  .7th  node  is  to  represent  a  given  category.  Ac¬ 
cording  to  Eq.  ( 13),  the  weight  vector  (u'u, ....  w^j)  converges  to  the  signal  vector 
{Si,  .  .  .  ,S/^)  when  the  ith  node  is  active;  but  that  w'eight  vi'ctor  remains  u  iiA  uai  iged 
when  a  different  category  representation  is  active.  The  term  xj  thus  buffers,  or 
gates,  the  weights  ie,j  against  undesired  changes,  including  memory  loss  due  to 
passive  decay.  On  the  other  hand,  a  typical  learning  law  for  outstar  pattern  learn¬ 
ing  is  given  by  Eq.  (14); 

(14) 

In  Eq.  (14),  when  the  7th  Fo  node  is  active  the  weight  vector  (tcji, ...  ,wjf^)  con¬ 
verges  to  the  Fi  activity  pattern  vector  {xi, . . .  ,xn)-  Again,  a  gating  term  buffers 
weights  against  inappropriate  changes.  Note  that  the  pair  of  learning  laws  described 
by  Eqs.  (13)  and  (14)  are  non-Hebbian.  and  are  also  non-symmetric.  That  is,  Wij 
is  generally  not  equal  to  Wji,  unless  the  Fj  and  Fo  signal  vectors  5  are  identical  to 
the  corresponding  activity  vectors  x. 

A  series  of  theorems  encompassing  neural  network  pattern  learning  by  systems 
employing  a  large  clciss  of  these  and  other  activation  and  learning  laws  was  proved 
by  Grossberg  in  the  late  1960s  and  early  1970s.  One  set  of  results  falls  under  the 
heading  outstar  learning  theorems.  One  of  the  most  general  of  these  theorems  is 
contained  in  an  article  entitled  “Pattern  Learning  by  Functional-Differential  Neu¬ 
ral  Networks  with  Arbitrary  Path  Weights.”’^  This  is  reprinted  in  Studies  of  Mind 
and  Brain,~~  which  also  contains  articles  that  introduce  and  analyse  additive  and 
shunting  equations  (10)  and  (11);  learning  with  passive  and  gated  memory  decay 
laws  (6),  (13),  and  (14);  outstar  and  instar  modules;  and  neural  network  architec¬ 
tures  constructed  from  these  elements. 
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FIGURE  12  The  avalanche:  a  neural  network  capable  of  learning  and  performing  an 
arbitrary  space-time  pattern. 


16.  LEARNING  SPACE-TIME  PATTERNS:  THE  AVALANCHE 

While  most  of  the  neural  network  models  discussed  in  this  article  are  designed 
to  learn  spatial  patterns,  problems  such  as  speech  recognition  and  motor  learning 
require  an  understanding  of  space-time  patterns  as  well.  An  early  neural  network 
model,  cetlled  the  avalanche,  is  capable  of  learning  and  performing  an  arbitrary 
space-time  pattern.^"*  In  essence,  an  avalanche  is  a  series  of  outstars  (Figure  12). 
During  learning,  the  outstar  active  at  time  t  learns  the  spatial  pattern  x(<)  gener¬ 
ated  by  the  input  pattern  vector  a(f).  It  is  useful  to  think  of  x{t)  as  the  pattern 
determining  finger  positions  for  a  piano  piece:  the  same  field  of  cells  is  used  over 
and  over,  and  the  sequence  ABC  is  not  the  same  as  CBA.  Following  learning,  when 
no  input  patterns  are  present,  activation  of  the  sequence  of  outstars  reads-out,  or 
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“performs,”  the  space-time  pattern  it  had  previously  learned.  In  its  minimal  form, 
this  network  can  be  realized  as  a  single  cell  with  many  branches.  Learning  and  per¬ 
formance  can  also  be  supervised  by  a  nonspecific  GO  signal.  The  GO  signal  may 
terminate  an  action  sequence  at  any  time  and  otherwise  modulate  the  performance 
energy  and  velocity.  In  general,  the  order  of  activation  of  the  outstars,  as  well  as 
the  spatial  patterns  themselves,  need  to  be  learned.  This  can  be  accomplished  us¬ 
ing  autoassociative  networks,  as  in  the  theory  of  serial  learning^ ^  or  adaptive  signal 
processings^ 


17.  ADAPTIVE  CODING  AND  CATEGORY  FORMATION 

Let  us  now  return  to  the  theme  of  adaptive  coding  and  category  formation,  intro¬ 
duced  earlier  in  our  discussion  of  Steinbuch’s  learning  matrix.  As  shown  in  Figure 
8(b),  the  learning  matrix  can  be  recast  in  the  adaptive  filter  formalism,  with  the 
dynamics  of  the  F2  level  defined  in  such  a  way  that  only  one  node  is  active  at  a 
given  time.  The  active  node,  or  category  representation,  is  selected  by  a  "teacher" 
during  learning.  During  performance  the  active  node  is  selected  according  to  which 
weight  vector  forms  the  best  match  with  the  input  vector.  Now  compare  the  learn¬ 
ing  matrix  in  Figure  8(b)  with  the  instar  in  Figure  11.  The  pictures,  or  network 
“anatomies,”  seem  to  indicate  that  the  instar  is  identical  to  the  learning  matrix. 
The  difference  between  the  two  models  lies  in  the  dynamics,  or  network  "physi¬ 
ology.”  The  fundamental  characteristic  ‘>f  the  ir.sf  tr  thar  iictinguishes  it  from  the 
learning  matrix  and  other  early  models  is  the  constraint  that  instar  dynamics  occur 
in  real  time.  In  particular,  the  instar  filtered  input  S  •  W;  influences  Xj  at  all  times, 
and  is  not  artificially  suppressed  during  learning.  However,  the  desire  to  construct 
a  category  learning  system  that  can  operate  in  real  time  immediately  leads  to  many 
questions.  The  most  pressing  one  is;  how  can  the  categories  be  represented  if  the 
dynamics  are  not  imposed  by  an  external  agent?  For  the  choice  case,  for  example, 
the  internal  system  dynamics  need  to  allow  at  most  one  A  node  be  active,  even 
though  other  nodes  may  continue  to  receive  large  inputs,  either  internally,  via  the 
filter,  or  externally,  via  the  vector  b.  Even  when  the  category  representation  is 
a  distributed  pattern,  this  representation  is  generally  a  compressed,  or  contrast- 
enhanced,  version  of  the  highly  distributed  net  pattern  coming  in  to  Fo  from  all 
sources.  This  compression  is,  in  fact,  the  step  that  carries  out  the  process  wherein 
some  or  many  items  are  grouped  into  a  new  unit,  or  category. 
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dX; 

^  =  -Xi  +  (A-xp[lj  +  f(Xj)]-X|p(x,) 

FIGURE  13  An  on-center/off-surround  shunting  competitive  network.  Qualitative 
features  of  the  signal  function  f{xj)  determine  the  way  in  which  the  network  transforms 
the  input  vector  I  into  the  state  vector  x. 


18.  SHUNTING  COMPETITIVE  NETWORKS 

Fortunately,  there  is  a  well-defined  class  of  neural  networks  ideally  suited  to  play  the 
role  of  the  category  representation  field.  This  is  the  class  of  on-center/off-surround 
shunting  competitive  networks.  Figure  13  illustrates  one  such  system.  There,  the 
input  vector  I  can  be  the  sum  of  inputs  from  one  or  more  sources  and  is,  in  general, 
highly  distributed.  On-center  here  refers  to  the  feedback  process  whereby  a  cell 
sends  net  excitatory  signals  to  itself  and  to  its  immediate  neighbors;  off-surround 
refers  to  the  complementary  process  whereby  the  same  cell  sends  net  inhibitory  sig¬ 
nals  to  its  more  distant  neighbors.  In  a  1973  article  entitled  “Contour  Enhancement, 
Short-Term  Memory,  and  Constancies  in  Reverberating  Neural  Networks,”  Gross- 
berg  carried  out  a  mathematical  characterization  of  the  dynamics  of  various  classes 
of  shunting  competitive  networks.  In  particuleir  he  classified  the  systems  accord¬ 
ing  to  the  shape  of  the  signal  function  f{xj).  Depending  upon  whether  this  signal 
function  is  linear,  faster-than-linear,  slower-than-linear,  or  sigmoid,  the  networks 
are  shown  to  quench  or  enhance  low-amplitude  noise,  and  to  contrast-enhance  or 
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flatten  the  input  pattern  I  in  varying  degrees.  In  particular,  a  faster-than-linear 
signal  function  implements  the  choice  network  needed  for  many  models  of  cate¬ 
gory  learning.  A  sigmoid  signal  function,  on  the  other  hand,  suppre.s.ses  noi.se  and 
contrast-enhances  the  input  |)att<'rn.  without  necessarily  going  to  the  e.xtrern*'  of 
concentrating  all  activity  in  oiu'  node.  Thus  an  oti-center/off-siirround  shunting 
competitive  network  with  a  sigtnoid  signal  function  is  shown  to  Ix"  an  ideal  di'sign 
tor  a  category  learning  system  with  liistributed  code  representations.  This  paraiiK't- 
ric  analysis  thus  provided  tin'  toundation  for  constructing  larger  network  architec¬ 
tures  that  use  a  compeuitive  inUwork  as  a  component  with  well-didined  functional 
properties. 


19.  COMPETITIVE  LEARNING 

A  module  of  fundamental  importance  in  recent  neural  network  architectures  is  di'- 
scribed  by  the  phrase  <  ompetitivc  Ivarving.  This  module  brings  the  properties  of 
the  into  the  real-time  .setting.  The  basic  competitive  learning  architecture  consists 
of  an  instar  filter,  from  a  field  l\  to  a  field  Fo,  and  a  competitive  neural  network  at 
Fo  (Figure  14).  The  cotnpetitive  learning  module  can  operate  with  or  without  an 
external  teaching  signal  b.  and  learned  changes  in  the  adaptive  filter  can  proceed 
indefinitely  or  cease  after  a  finite  time  interval.  If  there  is  no  leaching  signal  at  a 
given  time,  then  the  net  input  vector  to  Fo  is  the  sum  of  signals  arriving  via  the 
adaptive  filter.  Then,  if  the  category  representation  network  is  designed  to  make  a 
choice,  the  node  that  automatically  becomes  active  is  the  one  whose  weight  vector 
best  matches  the  signal  vector,  as  in  Eq.  (2).  If  there  is  a  teaching  signal,  the  cat¬ 
egory  representation  decision  still  depends  on  past  learning,  but  this  is  balanced 
against  the  external  signal  b.  which  may  or  may  not  overrule  the  past  in  the  com¬ 
petition.  In  either  case,  an  instar  learning  law  such  as  Eq.  (1.3)  allows  a  chosen 
category  to  encode  aspects  of  the  new  F\  pattern  in  its  learned  representation. 


20.  COMPUTATIONAL  MAPS 

Investigators  who  have  developed  and  analyzed  the  competitive  learning  paradigm 
over  the  years  include  K.  Steinbuch'*'*;  S.  Grossberg’*  '^  C.  von  der  Malsburg’*’: 
S.-I.  Amari^;  S.-I.  Amari  and  A.  Takeuchi'’;  E.  Bienenstock.  L.  Cooper,  and  P. 
Munro^;  D.  Rumelhart  and  D.  Zipser‘'^;  and  many  others.  Moreover,  these  and 
other  investigators  proceeded  to  embed  the  competitive  learning  module  in  higher- 
order  neural  network  systems.  In  particular,  systems  were  designed  to  learn  com¬ 
putational  maps,  producing  an  output  vector  b  in  response  to  an  input  vector  a. 
The  core  of  many  of  these  computational  map  models  is  an  instar-outstar  system. 
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INSTAR  +  CONTRAST  ENHANCEMENT 


FIGURE  14  The  basic  competitive  learning  module  combines  the  instar  pattern  coding 
system  with  a  competitive  network  that  contrast-enhances  its  filtered  input. 


Recognition  of  this  common  theme  highlights  the  models’  differences  as  well  as  their 
similarities.  An  early  self-organizing  three-level  instar-outstar  computational  map 
model  was  described  by  Grossberg/^  who  later  replaced  the  instar  portion  of  this 
model  with  a  competitive  learning  module.^®  The  self-organizing  feature  map^^  auid 
the  counter-propagation  network^®  are  also  examples  of  instar-outstar  competitive 
learning  models. 

The  basic  instar-outstar  computational  map  system  is  depicted  in  Figure  15. 
The  first  two  levels,  Fi  and  F2,  form  a  competitive  learning  system.  Included  are  the 
fan-in  adaptive  filter,  contrast  enhancement  at  the  “hidden”  level  F2,  and  a  learn¬ 
ing  law  for  instar  coding  of  the  input  patterns«a.  The  top  two  levels  then  employ 
a  fan-out  adaptive  filter  for  outstar  pattern  learning  of  the  vector  b.  This  three-level 
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FIGURE  15  A  three-level,  feedforward  instar-outstar  module  for  computational 
mapping.  The  competitive  learning  module  (Fj  and  F2)  is  joined  with  an  outstar-type 
fan-out,  for  spatial  pattern  learning. 


architecture  allows,  for  example,  two  very  different  input  patterns  to  map  to  the 
same  output  pattern:  each  input  pattern  can  ^tivate  its  own  compressed  represen¬ 
tation  at  F2,  while  each  of  these  F2  representations  can  learn  a  common  output 
vector.  In  the  extreme  case  where  eaich  input  vector  a  activates  its  own  F2  node 
the  system  learns  any  desired  output.  The  generality  of  this  extreme  case,  which 
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implements  an  arbitrary  mapping  from  R*”  to  R",  is  offset  by  its  lack  of  general¬ 
ization,  or  continuity,  as  well  as  by  the  fact  that  each  learned  pair  (a,b)  requires 
its  own  F2  node.  Distributed  Fo  representations  provide  greater  generalization  and 
efficiency,  at  a  cost  in  complete  a  priori  generality  of  the  mapping. 


21.  INSTABILITY  OF  COMPUTATIONAL  MAPS 

The  widespread  use  of  instar-outstar  families  of  computational  maps  attests  to  the 
power  of  this  basic  neural  network  architecture.  This  power  is,  however,  diminished 
by  the  instability  of  feedforward  systems:  in  general,  recently  learned  patterns  tend 
to  erode  past  learning.  This  instability  arises  from  two  sources.  First,  even  if  a 
chosen  category  is  the  best  match  for  a  given  input,  that  match  may  nevertheless 
be  a  poor  one,  chosen  only  because  all  the  others  are  even  worse.  Established  codes 
are  thus  vulnerable  to  recoding  by  “outliers.”  Second,  learning  laws  such  as  Eq.  (13) 
imply  that  a  weight  vector  tends  toward  a  new  vector  that  encodes  the  presently 
active  pattern,  thereby  weakening  the  trace  of  the  past.  Thus  weight  vectors  czui 
eventually  drift  far  from  their  original  patterns,  even  if  learning  is  very  slow  and 
even  if  each  individual  input  makes  a  good  match  with  the  past  as  recorded  in  the 
weights. 

The  many  existing  variations  on  the  three-level  instar-outstar  theme  illustrate 
some  of  the  ways  in  which  this  family  of  models  can  be  adapted  to  cope  with  the  ba¬ 
sic  system’s  intrinsic  instability.  One  stabilization  technique  causes  learning  to  slow 
or  cease  after  an  initial  finite  interval,  but  then  a  subsequent  unexpected  pattern 
cannot  be  encoded,  and  instability  could  still  creep  in  during  the  initial  learning 
phase.  Another  approach  is  to  restrict  the  class  of  input  patterns  to  a  stable  set. 
This  technique  requires  that  the  system  can  be  sufficiently  well  analyzed  to  iden¬ 
tify  such  a  class,  like  the  orthogonal  inputs  of  the  linear  associative  memory  model 
(Figure  9),  and  that  ail  inputs  can  be  confined  to  this  class.  An  often  successful 
way  to  compensate  for  the  instability  of  these  systems  is  to  slow  the  learning  rate 
to  such  an  extent  that  learned  patterns  are  buffered  against  massive  recoding  by 
any  single  input.  Of  course,  then,  each  pattern  needs  to  be  presented  very  many 
times  for  adequate  learning  to  occur,  a  fact  that  was  discussed,  for  example,  by 
Rosenblatt  in  his  critique  of  back  propagation. 


22.  ADAPTIVE  RESONANCE  THEORY  (ART) 

It  was  analysis  of  the  instability  of  feedforward  instar-outstar  systems  that  led  to 
the  introduction  of  adaptive  resonance  theory  (ART)^*  and  to  the  development 
of  the  neural  network  systems  ART  1  and  ART  27'^  ART  networks  are  designed, 
in  particular,  to  resolve  the  stabiliiy-plasticiiy  dilemma:  they  are  stable  enough 
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to  preserve  significant  past  learning,  but  nevertheless  remain  adaptable  enough  to 
incorporate  new  information  whenever  it  might  appear. 

The  key  idea  of  adaptive  resonance  theory  is  that  the  stability-plasticity  dilem¬ 
ma  can  be  resolved  by  a  system  in  which  the  three-level  network  of  Figure  1 5  is 
folded  back  on  itself,  identifying  the  top  level  (F3)  with  the  bottom  level  (Fi)  of  the 
instar-outstar  mapping  system.  Thus  the  minimal  ART  module  includes  a  bottom- 
up  competitive  learning  system  combined  with  a  top-down  oulstar  pattern  learning 
system.  When  an  input  a  is  presented  to  an  ART  network,  .system  dynamics  initially 
follow  the  course  of  competitive  learning  (Figure  14),  with  bottom-up  activation 
leading  to  a  contrast-enhanced  category  representation  at  Ft.  In  the  absence  of 
other  inputs  to  Ft,  the  active  category  is  determined  by  past  learning  as  encoded 
in  the  adaptive  weights  in  the  bottom-up  filter.  But  now,  in  contrast  to  feedforward 
systems,  signals  are  sent  from  Ft  back  down  to  Fi  via  a  top-down  adaptive  filter. 
This  feedback  process  allows  the  ART  module  to  overcome  both  of  the  sources  of 
instability  described  in  Section  21,  as  follows. 

First,  as  in  the  competitive  learning  module,  the  category  active  at  Ft  may 
poorly  match  the  pattern  active  at  Fi.  The  ART  system  is  designed  to  carry  out  a 
matching  process  that  asks  the  question:  should  this  input  really  be  in  this  category? 
If  the  answer  is  no,  the  selected  category  is  quickly  rendered  inactive,  before  past 
learning  is  disrupted  by  the  outlier,  and  a  search  process  ensues.  This  search  process 
employs  an  auxiliary  orienting  subsystem  that  is  controlled  by  the  dynamics  of  the 
ART  system  itself.  The  orienting  subsystem  incorporates  a  dimensionless  vigilance 
parameter  that  establishes  the  criterion  for  deciding  whether  the  match  is  a  good 
enough  one  for  the  input  to  be  accepted  as  an  exemplar  of  the  chosen  category. 

Second,  once  an  input  is  accepted  and  learning  proceeds,  the  top-down  filter 
continues  to  play  a  different  kind  of  stabilizing  role.  Namely,  top-down  signals  that 
represent  the  past  learning  meet  the  original  input  signals  at  F] .  Thus  the  Fi 
activity  pattern  is  a  function  of  the  past  as  well  as  the  present,  and  it  is  this  blend 
of  the  two,  rather  than  the  present  input  alone,  that  is  learned  by  the  weights  in 
both  adaptive  filters.  This  dynamic  matching  during  learning  leads  to  stable  coding, 
even  with  fast  learning. 

An  example  of  the  ART  1  class  of  minimal  modules  is  illustrated  in 
Figure  16.  In  addition  to  the  two'adaptive  filters  and  the  orienting  subsystem. 
Figure  16  depicts  gain  control  processes  that  actively  regulate  learning.  Theorems 
have  been  proved  to  characterize  the  response  of  an  ART  1  module  to  an  arbitrary 
sequence  of  binary  input  patterns. ART  2  systems  were  developed  to  self-organize 
recognition  categories  for  analog  eis  well  as  binary  input  sequences.  One  principal 
difference  between  the  ART  1  and  the  ART  2  modules  is  shown  in  Figure  17.  In 
examples  so  far  developed,  the  stability  criterion  for  analog  inputs  has  required  a 
three-layer  feedback  system  within  the  Fj  level:  a  bottom  layer  where  input  pat¬ 
terns  are  read  in;  a  top  layer  where  filtered  inputs  from  F2  are  read  in;  and  a  middle 
layer  where  the  top  and  bottom  patterns  are  brought  together  to  form  a  matched 
pattern  that  is  then  fed  back  to  the  top  and  bottom  F\  layers. 
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FIGURE  1 6  An  ART  1  module  for  stable,  self-organizing  categorization  of  an  arbitrary 
sequence  of  binary  input  patterns. 


23.  ART  FOR  ASSOCIATIVE  MEMORY 

A  minimal  ART  module  is  a  category  learning  system  that  self-organizes  a  sequence 
of  input  patterns  into  various  recognition  categories.  It  is  not  an  associative  mem¬ 
ory  system.  However,  like  the  competitive  learning  module  in  the  1970s,  a  minimal 
ART  module  can  be  embedded  in  a  larger  system  for  associative  memory.  A  sys¬ 
tem  such  as  an  instar-outstar  module  (Figure  15)  or  a  back-propagation  algorithm 
(Figure  6)  directly  pairs  sequences  of  individual  vectors  (a,b)  during  learning.  If 
an  ART  system  replaces  levels  Fi  and  of  the  instar-outstar  module,  the  asso¬ 
ciative  learning  system  becomes  self-stabilizing.  ART  systems  can  also  be  used  to 
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INPUT  (ANALOG  OR  BINARY) 


FIGURE  1 7  Principal  elements  of  an  ART  2  module  for  stable,  self-organizing 
categorization  of  an  arbitrary  sequence  of  analog  or  binary  input  patterns.  The  Fi  level 
is  a  competitive  network  with  three  processing  layers. 


pair  sequences  of  the  categories  self-organized  by  the  input  sequences  (Figure  18). 
Moreover,  the  symmetry  of  the  architecture  implies  that  pattern  recall  can  occur 
in  either  direction  during  performance.  This  scheme  brings  to  the  associative  mem¬ 
ory  paradigm  the  code  compression  capabilities  of  the  ART  system,  as  well  as  its 
stability  properties. 
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FIGURE  1 8  Two  ART  systems  combined  to  form  an  associative  memory  architecture. 


24.  COGNITRON  AND  NEOCOGNITRON 

In  conclusion,  we  will  consider  two  sets  of  models  that  are  variations  on  the  themes 
previously  described.  The  first  class,  developed  by  Kunihiko  Fukushima,  consists  of 
the  cognitron^  and  the  larger-scale  neocognitron.*°’^'  This  class  of  neural  models 
is  distinguished  by  its  capacity  to  carry  out  translation-invariant  and  size-invariant 
pattern  recognition.  This  is  accomplished  by  redundantly  coding  elementary  fea¬ 
tures  in  various  positions  at  one  level;  then  cascading  groups  of  features  to  the  next 
level;  then  groups  of  these  groups;  and  so  on.  Learning  can  proceed  with  or  without 
a  teacher.  Locally  the  computations  are  a  type  of  competitive  learning  that  use 
combinations  of  additive  and  shunting  dynamics. 
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25.  SIMULATED  ANNEALING 

Finally,  in  addition  to  the  probabilistic  weight  change  laws  which  were  a  prominent 
feature  of,  for  example,  the  modeling  efforts  of  pioneers  such  a:  <>nblatt  and 

Amari,  another  class  of  probabilistic  weight  change  laws  appears  Wi  more  recent 
work  under  the  name  simulated  annealing,  introduced  by  S.  Kirkpatrick,  C.  D.  Gel- 
latt,  and  H.  P.  Vecchi.*^  The  main  idea  of  simulated  annealing  is  the  transposition 
of  a  method  from  statistical  mechanics,  namely  the  Metropolis  algorithm, into 
the  general  context  of  large  complex  systems.  The  Metropolis  algorithm  provides 
an  approximate  description  of  a  many-body  system,  namely  a  material  that  an¬ 
neals  into  a  solid  as  temperature  is  slowly  decreased.  Kirkpatrick  et  al.drew  an 
analogy  between  this  system  and  problems  of  combinatorial  optimization,  such  as 
the  traveling  salesman  problem,  where  the  goal  is  to  minimize  a  cost  function.  The 
methods  and  ideas,  as  well  as  the  large-scale  nature  of  the  problem,  are  so  closely 
tied  to  those  of  neural  networks  that  the  two  approaches  are  often  linked.  This 
link  is  perhaps  closest  in  the  Boltzmann  machine,^  which  uses  a  simulated  anneal¬ 
ing  algorithm  to  update  weights  in  a  binary  network  similar  to  the  additive  model 
studied  by  Hopfield.^* 


26.  CONCLUSION 

We  have  seen  how  the  adaptive  filter  formalism  is  general  enough  to  describe  a 
wide  variety  of  neural  network  modules  for  associative  memory,  category  learning, 
and  pattern  recognition.  Many  systems  developed  and  applied  in  recent  years  are 
variations  on  one  or  more  of  these  modular  themes.  This  approach  can  thus  provide 
a  core  vocabulary  and  grammar  for  further  analysis  of  the  rich  and  varied  literature 
of  the  neural  network  field. 
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Impulse  Activity  and  the  Patterning  of 
Connections  During  CNS  Development 


Reprinted  from  Neuron  5  (1990);  715-756.  Permission  granted  by  Cell 
Press. 

How  are  the  highly  ordered  sets  of  axonal  connections  so  characteristic  of 
organization  in  the  adult  vertebrate  central  nervous  system  formed  dur¬ 
ing  development?  Many  problems  must  be  solved  to  achieve  such  precise 
wiring;  axons  must  grow  along  the  correct  pathways  and  must  select  their 
appropriate  target(s).  Even  once  the  process  of  target  selection  is  complete, 
however,  the  many  axons  that  comprise  a  particular  projection  must  still 
arrange  themselves  in  an  orderljeand  highly  stereotyped  pattern,  typically 
one  in  which  nearest-neighbor  relations  are  preserved  so  that  the  terminal 
arbors  of  neighboring  projection  neurons  are  also  neighbors  within  the  tar¬ 
get.  Here,  I  would  like  to  consider  the  process  by  which  this  final  patterning 
of  neuronal  connections  comes  about  during  development.  Studies  of  the 
vertebrate  visual  system,  reviewed  here,  have  provided  extensive  evidence 
in  favor  of  the  hypothesis  that  an  activity-dependent  competition  between 
axonal  inputs  for  common  postsynaptic  neurons  is  responsible  in  good  part 
for  the  establishment  of  orderly  sets  of  connections. 


1 991  Lectures  in  Complex  Systems,  SFI  Studies  in  the  Sciences  of  Complexity, 
Lect.  Vol.  IV,  Eds.  L.  Nadel  &  D.  Stein,  Addison-Wesley,  1992 


43 


44 


Carla  J.  Shatz 


COMPETITION  IN  THE  FORMATION  OF  OCULAR  DOMINANCE 
COLUMNS  IN  THE  MAMMALIAN  PRIMARY  VISUAL  CORTEX. 

Many  insights  into  developmental  mechanisms  underlying  the  formation  of  orderly- 
connections  have  come  from  studies  of  the  mammalian  visual  system,  in  which  the 
clear-cut  patterning  of  connections  is  exemplified  in  the  highly  topographic  order¬ 
ing  of  projections  and  strict  segregation  of  inputs  from  the  two  eyes  at  successive 
levels  of  visual  information  processing  (for  reviews,  see  Rodieck^''  and  Sherman 
and  Spear®*^).  Ganglion  cell  axons  from  each  eye  project  to  the  lateral  geniculate 
nucleus  (LGN)  on  both  sides  of  the  brain.  However,  within  the  LGN,  axons  from 
the  two  eyes  terminate  in  a  set  of  separate,  alternating  eye-specific  layers  that  are 
strictly  monocular^^  (see  Figure  1).  Neurons  in  the  LGN  project,  in  turn,  to  layer 
4  of  the  primary  visual  cortex  where,  again,  axons  are  segregated  according  to  eye 
of  origin  into  alternating  monocularly  innervated  patches  that  represent  the  system 
of  ocular  dominance  columns  within  cortical  layer  4  23.26.59,60 


FIGURE  1  A  simplified  diagram  of  the  mammalian  visual  pathways.  Only  connections 
from  each  eye  to  the  left  side  of  the  brain  are  shown.  Retinal  ganglion  cell  axons 
from  the  two  eyes  travel  to  the  lateral  geniculate  nucleus  (LGN)  of  the  thamalus, 
where  their  terminals  are  segregated  in  separate  eye-specific  layers.  The  axons  of 
neighboring  retinal  ganglion  cells  within  each  eye  terminate  in  neighboring  regions 
within  the  appropriate  layers,  establishing  a  topographically  ordered  map.  LGN  neurons, 
in  turn,  project  to  layer  4  of  the  primary  visual  cortex  where  again  axonal  terminal 
arbors  of  LGN  neurons  representing  the  two  eyes  are  segregated  into  alternating  ocular 
dominance  patches. 
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FIGURE  2  Summary  of  the  prenatal  development  of  the  eye-specific  layers  in  the 
cat’s  LGN.  Shaded  areas  iruticate  regions  within  the  LGN  simultaneously  occupied  by 
ganglion  cell  axons  from  the  two  eyes  at  different  times  in  development,  as  derived 
by  the  anterograde  transport  of  intraocularly  injected  tracers.  Stick  figures  show  the 
appearance  of  representative  ganglion  cell  axons  from  the  ipsilateral  (shorter  axons 
at  each  age)  and  contralateral  (longer)  eyes,  based  on  studies  of  the  morphology  of 
individual  axons  filled  with  horseradish  peroxidase  tn  vitro  (see  Shatz,^^  for  more 
details;  reproduced  with  permission  from  Shatz^^).  The  eye-specific  layers  emerge 
as  retinal  ganglion  cell  axons,  withdraw  delicate  sidebranches  from  inappropriate 
regions,  and  elaborate  complex  terminal  arbors  within  appropriate  regions  of  the  LGN. 
E=embryonic  age;  gestation  in  the  cat  is  65  days. 


Remarkably,  neither  the  layers  within  the  LGN  nor  the  columns  within  the  cor¬ 
tex  are  present  initially  during  development  (for  reviews,  see  Sretavan  and  Shatz,®® 
Shatz,®^  and  Miller  and  Stryker'*^).  When  retinal  gemglion  cell  axons  from  the  two 
eyes  first  grow  into  the  LGN,  they  are  intermixed  with  each  other  throughout  a  good 
portion  of  the  nucleus;  the  eye-specific  layers  emerge  as  axons  from  the  two  eyes 
gradually  remodel  by  withdrawing  modest  branches  from  inappropriate  territory 
and  growing  extensive  terminal  arbors  within  appropriate  territory®®  (Figure  2). 
Physiological  studies  in  vttro^^  and  electron  microscopic  examination  of  identified 
retinal  ganglion  cell  axons®’®^  suggest  that  this  remodeling  is  accompanied  by  the 
reorganization  of  synapses  from  the  two  eyes  such  that  initial  binocular  convergence 
is  replaced  by  monocular  inputs. 
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FIGURE  3  The  postnatal  development  of  the  ocular  dominance  patches  within  layer 
4  of  the  primary  visual  cortex  of  the  cat  is  summarized.  The  location  of  LGN  axons  is 
monitored  by  means  of  the  transneuronal  transport  through  the  LGN  of  radioactively 
labeled  material  (which  appears  white  in  these  darkfiekf  photographs)  injected  into  one 
eye.  The  adult  pattern  of  layer  4  labeling — patches  separated  by  gaps  of  roughly  equal 
size — can  be  seen  by  92  days  postnatal.  However,  at  2  weeks  postnatal,  the  pattern  of 
labeling  within  layer  4  is  continuous,  indicating  that  LGN  axons  representing  the  two 
eyes  are  intermixed  with  each  other,  (dased  on  experiments  presented  in  LeVay  et 
al.^'‘) 


Ocular  dominance  columns  in  layer  4  form  from  extensively  intermixed  LGN 
inputs  representing  the  two  eyes  (Figure  3),  presumably  also  by  a  process  of  axonal 
remodeling  and  synapse  elimination.  At  present,  little  is  known  about  the  exact 
morphological  details  because  few  individual  axons  have  been  successfully  labeled 
for  study,  but  microelectrode  recordings  have  shown  that  initially  the  majority  of 
neurons  in  cortical  layer  4  receive  functional  inputs  from  LGN  afferents  representing 
both  eyes.^'*’^®  Thus,  here  too,  ocular  segregation  emerges  from  an  initial  condition 


FIGURE  4 
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FIGURE  4  (cont’d.)  The  efJects  of  monoculaw  eye  closure  at  birth  on  the  subsequent 
(adult)  organization  of  the  ocular  dominance  columns  in  layer  4  of  the  monkey  visual 
cortex,  as  revealed  by  the  transneuronal  transport  method  (see  Figure  3).  (a)  The 
normal  tangential  organization  of  LGN  afferents  within  layer  4  into  alternating  stripes 
of  equal  width  representing  the  injected  and  uninjected  eye.  (b)  The  representation 
of  the  open  eye  within  layer  4  following  monocular  deprivation — LGN  axons  occupy 
most  of  layer  4,  with  only  small  unlabeled  regions  remaining  for  the  LGN  axons 
representing  the  closed  eye.  (c)  The  pattern  of  transneuronal  labeling  resulting  from 
injection  of  the  closed  eye  is  complementary  to  that  shown  in  (b).  indicating  a  shrinkage 
of  territory  devoted  to  the  representation  of  the  closed  eye  within  layer  4.  Reprinted  with 
permission  from  Hubei  et  al.^^ 


of  functional  synaptic  convergence  of  inputs  representing  the  two  eyes  onto  common 
(layer  4  cortical)  neurons  (See  Figure  5;  compare  neonate  and  adult).  In  higher 
mammals,  the  formation  of  the  LGN  layers  occurs  largely,  if  not  entirely,  prenatally 
and  precedes  the  onset  of  ocular  dominance  column  formation  within  the  cortex, 
which  occurs  largely  (monkey)  or  entirely  (cat)  postnatally.^'’’^®’'*^  ®^ 

How  do  inputs  representing  the  two  eyes  segregate  from  each  other  to  form 
layers  or  columns?  The  fifst  clues  came  from  the  pioneering  studies  of  Hubei  and 
Wiesel  on  the  effects  of  visual  deprivation  on  the  functional  organization  of  the 
primary  visual  cortex.  In  the  normal  adult  visual  cortex,  the  majority  of  neurons  are 
binocular:  that  is,  they  respond  to  visual  stimulation  of  either  eye.  Even  binocular 
neurons  tend  to  be  dominated  by  one  eye  or  the  other,  and  as  mentioned  above, 
layer  4  neurons  tend  to  be  exclusively  driven  by  stimulation  of  one  eye  only  so  that 
the  cortex  is  evenly  divided  into  ocular  dominance  columns  for  both  eyes.‘^  *®’®° 
However,  if  one  eye  is  deprived  of  vision  by  closing  the  eyelids  at  birth  for  several 
days  to  weeks,  the  ocular  dominance  distribution  of  neurons  in  visual  cortex  is 
drastically  shifted:  as  shown  in  Figure  5  (MD),  now,  the  majority  (90%)  of  neurons 
are  monocularly  driven  only  by  stimulation  of  the  open  eye.^®'^®  (Neurons  in  the 
retina  and  LGN  remain  responsive  to  their  normal  inputs.^®)  The  physiological  shift 
in  ocular  dominance  within  the  cortex  is  paralleled  by  a  profound  change  in  the 
anatomical  organization  of  LGN  ax<wis  within  layer  4:  LGN  axons  representing  the 
open  eye  now  occupy  most  of  layer  4,  while  those  representing  the  closed  eye  are 
relegated  to  very  small  patches^®  (see  Figure  4). 

The  observation  that  the  wiring  of  LGN  axons  and  the  eye  preference  of  cor¬ 
tical  neurons  can  be  influenced  by  ezirly  visual  experience  sets  the  stage  for  the 
idea  that  use  of  the  visual  system  is  required  for  its  normal  development  and  for 
the  maintenance  of  its  connections,  at  least  during  an  early  period  of  susceptibility 
called  the  “Critical  Period.” But  how  might  abnormal  use,  such  as  the  occlu¬ 
sion  of  one  eye,  result  in  such  profound  changes  in  connectivity  at  the  level  of  the 
visual  cortex?  The  most  reasonable  explanation  is  that  a  use-dependent  synaptic 
competition  between  LGN  axons  serving  the  two  eyes  for  layer  4  neurons  normally 
drives  the  formation  of  the  ocular  dominance  columns  during  the  critical  period. 


Impulse  Activity  and  the  Patterning  of  Connections  During  CNS  Devetopment 


49 


4) 

u 


ADULT 


c 

4) 

w 

4) 

a 


50  H 


1  2  3  4  5 


NEONATE 


0  O 


TTX 


MO  *  MUSC 


STRAB 


MD+APV 


0  O 

FIGURE  5  Idealized  summary  diagram  of  the  effects  of  various  manipulations  that 
alter  the  pattern  or  levels  of  visually  driven  activity  on  the  ocular  dominance  of  visual 
cortical  neurons  as  assessed  physiologically.  In  the  normal  adult  cortex  (ADULT),  the 
majority  of  neurons  are  binocularly  driven,  with  a  roughly  even  distribution  of  neurons 
representing  each  eye  (group  1  =  neurons  responding  exclusively  from  the  right  eye; 
group  2  =  neurons  responding  predominantly  to  the  right  eye  but  some  also  from  the 
left  eye;  group  3  =  neurons  responding  equally  to  the  two  eyes;  group  4  =  neurons 
responding  predominantly  to  the  left  eye;  and  group  5  »  neurons  responding  exclusively 
to  the  left  8ye).®°’^^  The  majority  of  group  1  and  5  neurons  are  found  within  layer  4.  In 
neonates,  there  are  very  few  monocularly  driven  neurons,  presumably  since  inputs  from 
the  two  eyes  are  extensively  intermixed  even  within  layer  4.^“'  If  the  right  eye  is  closed 
at  birth  (0),  then  an  ocular  dominance  shift  in  favor  of  the  left  eye  (0)  occurs  with  long¬ 
term  monocular  deprivation  (MD).^^-®°  However,  if  MD  is  combined  with  cortical  infusion 
of  muscimol  (MD+MUSC)®^  during  the  critical  period,  then  the  shift  in  favor  of  the  open 
eye  is  prevented  close  to  the  infusion  site  and  instead  a  shift  in  favor  of  the  closed 
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FIGURE  5  (cont’d.)  eye  results.  Binocular  deprivation  (BD)^®  during  the  critical 
period,  however,  does  not  have  an  obvious  effect  on  cortical  ocular  dominance, 
whereas  intraocular  injections  of  TTX  during  the  same  period  retains,  or  possibly 
exaggerates,  the  highly  binocular  distribution  present  in  neonates'^  (cf.  TTX  and 
Neonates).  In  contrast,  alternating  monocular  deprivation  or  strabismus  (STRAB) 
causes  a  complete  loss  of  binocular  neurons  within  the  cortex.'^  See  text  for  further 
details. 


Consequently,  unequal  use  caused  by  monocular  deprivation  could  bicis  the  out¬ 
come  in  favor  of  the  open  eye.  Many  lines  of  evidence  support  the  suggestion  that 
competitive  interactions  are  involved.  Binocular  deprivation  leaves  the  ocular  dom¬ 
inance  of  cortical  neurons  unaltered  (Figure  5:  although  neurons  eventually 

do  not  respond  briskly  to  visual  stimulation.  In  a  clever  experiment,  Guillery^^ 
demonstrated  that  competitive  interactions  are  not  only  present,  but  must  occur 
locally  within  the  cortex,  between  LGN  axons  subserving  corresponding  regions  of 
the  visual  field.  He  sutured  one  eye  closed  and  then  made  just  a  small  lesion  in  the 
open  eye,  destroying  a  localized  group  of  ganglion  cells  there.  As  a  consequence, 
the  effects  of  monocular  deprivation  were  manifested  everywhere  except  within  the 
small  region  receiving  LGN_axons  representing  the  lesioned  area  of  the  open  eye  and 
the  corresponding  region  of  the  closed  eye.  Thus  equal  use  of  the  two  eyes  during 
the  critical  period  subserves  competitive  interactions  whose  outcome  is  manifested 
in  the  even  distribution  of  ocular  dominance  columns.  Similar  competitive  interac¬ 
tions  are  thought  to  operate  even  earlier  in  development  to  drive  the  formation  of 
LGN  layers,  as  discussed  more  fully  at  the  conclusion  of  this  article. 


THE  ROLE  OF  PATTERNED  NEURAL  ACTIVITY  IN  COMPETI¬ 
TIVE  INTERACTIONS 

Signalling  by  neurons  is,  of  course,  via  action  potentials  and  synaptic  transmission; 
hence,  the  effects  of  visual  experience  on  cortical  organization  must  be  a  conse¬ 
quence  of  alterations  in  either  the  level  or  patterning  (or  both)  of  neural  activity 
within  the  visual  pathways.  The  most  graphic  demonstration  that  this  must  be  the 
case  comes  from  experiments  in  which  the  inputs  from  both  eyes  are  completely  si¬ 
lenced  by  injecting  Tetrodotoxin  (TTX),  a  blocker  of  the  sodium  channel,  for  several 
weeks  postnatally  during  the  critical  period  and  then  examining  the  consequences 
on  the  formation  of  ocular  dominance  columns  in  layer  4.^^  Intraocular  application 
of  TTX  conveniently  silences  the  entire  pathway  from  retina  to  cortex  since  there 
is  very  little  spontaneously  generated  activity  in  central  visual  pathways  in  the  ab¬ 
sence  of  the  eyes.^^  Segregation  of  LGN  axons  into  patches  within  cortical  layer 
4  was  prevented  completely,  and  neurons  in  layer  4,  normally  monocularly  driven, 
were  instead  binocularly  driven  (Figure  6:  TTX),  reminiscent  of  the  initial  period 
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of  normal  postnatal  development.^’^®  Indeed,  at  present  it  is  not  known  whether 
the  effect  of  the  TTX  treatment  is  to  simply  arrest  development  or  to  permit  con¬ 
tinued  but  undirected  growth  of  LGN  axon  terminals  within  layer  4.  Examination 
of  axonal  morphology  should  eventually  clarify  this  issue.  Analogous  results  are  ob¬ 
tained  when  cortical  activity  alone  is  blocked  (both  pre-  and  postsynaptically),  by- 
infusing  TTX  locally  via  an  osmotic  minipump®';  such  treatment,  when  performed 
during  the  critical  period,  prevents  the  shift  in  cortical  ocular  dominance  produced 
by  monocular  eye  closure. 

These  experiments  indicate  that  neural  activity  is  necessary  for  ocular  dom¬ 
inance  columns  to  form  during  development  (and  for  them  to  be  perturbed  with 
monocular  deprivation),  but  they  do  not  reveal  how  an  activity-dependent  signal 
might  permit  the  selection  of  appropriate  inputs  from  each  eye  to  generate  the  seg¬ 
regated  pattern  characteristic  of  the  adult  geniculocortical  projection.  Experiments 
in  which  the  use  of  the  two  eyes  remains  equal,  but  is  never  synchronous,  provide 
some  clues.  During  the  critical  period,  if  artificial  strabismus  is  produced  by  cut¬ 
ting  the  extraocular  muscles  of  one  eye,  thereby  disrupting  normal  eye  alignment, 
or  if  the  eyes  are  closed  alternately  so  that  the  total  amount  of  vision  received  by 
each  eye  is  the  same,  but  vision  is  never  binocular,  then  essentially  every  neuron  in 
the  primary  visual  cortex  becomes  exclusively  monocularly  innervated,  with  cells 
of  like  ocal?'’  dominance  gcouped  into  entirely  “monocular”  columns  (see  Figure  5: 
AMD)  (recall  that  in  normal  animals,  only  layer  4  is  monocular).*'’  These  results 
suggest  that  information  concerning  the  relative  timing  of  activity  in  the  two  eyes 
is  somehow  used  to  distinguish  inputs  at  the  cortical  level:  asynchrony  leads  to 
ocular  segregation;  synchrony  maintains  binocularity. 

The  conclusion  that  the  formation  of  ocular  dominance  columns  is  influenced 
by  the  timing  and  patterning  of  neuronal  activity  within  the  retinae  is  underscored 
by  the  results  of  an  experiment  by  Stryker  and  Strickland'^^  in  which  retinal  ac¬ 
tivity  was  first  blocked  by  intraocular  injections  of  TTX,  but  then  experimentally 
controlled  by  electrically  stimulating  the  optic  nerves  either  synchronously  or  asyn¬ 
chronously.  Synchronous  stimulation  of  the  two  nerves  prevented  the  formation  of 
ocular  dominance  columns,  whereas  asynchronous  stimulation  permitted  them  to 
form.  The  only  difference  between  the  two  experiments  was  the  timing  of  stimula¬ 
tion,  thereby  demonstrating  directly  that  the  patterning  of  neural  activity  provides 
sufficient  information  for  ocular  segregation  to  occur,  at  least  at  the  level  of  the 
primary  visual  cortex. 

The  above  considerations  can  also  explain  why  ocular  dominance  columns  can 
develop  even  when  animals  are  binocularly  deprived  or  reaired  in  the  dark  dur¬ 
ing  the  critical  period. In  the  absence  of  visual  stimulation,  ganglion  cells  in  the 
mammalian  retina  of  adults^®’®"*  and  even  in  fetal  animals'®’^®  fire  action  poten¬ 
tials  spontaneously.  Such  spontaneous  firing  could  supply  activity-dependent  cues, 
provided  that  ganglion  cell  firing  in  the  two  eyes  is  asynchronous. 

Before  examining  further  how  the  timing  and  patterning  of  impulse  activity 
might  lead  to  segregation  of  geniculocortical  afferents,  it  is  worth  considering  briefly 
why  an  alternative  hypothesis  for  the  formation  of  segregated  inputs,  one  that  in¬ 
vokes  the  existence  of  eye-specific  molecular  labels  within  the  cortex,  is  at  odds  with 
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most  experimental  observations.  First,  geniculocortical  axons  segregate  to  form  oc¬ 
ular  dominance  columns  whose  precise  locations  within  the  visual  cortex  are  unpre¬ 
dictable,  although  the  global  arrangement  of  the  columns  is  similar  from  one  animal 
to  the  next.^®  Second,  blockade  of  neural  activity  within  the  eyes  prevents  segre¬ 
gation  of  geniculocortical  axons  (themselves  not  directly  affected  by  TTX).  Thus, 
if  eye-specific  labels  were  present  within  the  cortex,  they  should  still  have  been 
recognized  by  LGN  axons.  Moreover,  such  markers  should  operate  to  form  columns 
regardless  of  the  paHern  of  '^Wtrical  stimulation  of  the  optic  nerves  (synchronous 
vs.  asynchronous).  Third,  there  is  no  obvious  tendency  for  axons  representing  the 
’•ight  or  left  eyes  to  be  grouped  together  prior  to  segregation^'*’®^;  indeed,  activity- 
ependent  models  of  ocular  dominance  column  formation  can  easily  produce  seg¬ 
regated  inputs  from  an  initially  randomly  intermixed  condition  (see  Miller  et 
for  more  details).  It  should  be  noted  that  the  absence  of  such  labels  with  respect  to 
eye  of  origin  in  no  way  argues  ag2iinst  the  existence  of  specific  molecular  cues  that 
could  initially  guide  axons  to  their  appropriate  targets  (LGN,  visual  cortex)  during 
development,  or  that  could  help  to  establish  coarse  retinotopic  projections  within 
these  targets.  By  analogy  with  studies  in  lower  vertebrates  (for  review  see  Udin  and 
Fawcett^®),  such  cues  are  highly  likely  to  be  present  in  the  marruneilian  CNS  as  well. 
However,  once  axons  reach  their  correct  target  and  establish  a  coarse  topographic 
projection,  activity-dependent  interactions  could  provide  the  major  cues  necessary 
for  segregation. 

Finally,  a  set  of  creative  experiments  performed  in  the  amphibian  visual  system 
also  argues  against  the  presence  of  intrinsic  eye-specific  labels  within  the  postsynap- 
tic  targets  of  retinal  ganglion  cells.  In  amphibians  during  larval  development,  the 
projections  from  retinal  ganglion  ceils  to  their  principal  target,  the  optic  tectum, 
are  entirely  crossed.  Consequently,  each  tectum  receives  a  map  from  the  whole 
contralateral  retina.  The  map  is  topographicaJly  orderly,  such  that  the  axons  of 
neighboring  retinal  ganglion  cells  terminate  in  neighboring  regions  of  the  optic 
tectum.  In  frogs,  it  is  possible  to  perform  experimental  manipulations  in  the  em¬ 
bryo  to  transplant  an  extra  eye  onto  one  side  of  the  head.  Axons  from  both  the 
normal  and  transplanted  eyes  are  then  capable  of  growing  into  the  optic  tectum, 
artificially  creating  a  competitive  situation.  Constantine-Paton  and  her  colleagues 
have  shown  that  axons  from  both  e^es  segregate  into  eye-specific  stripes  reminis¬ 
cent  of  the  stripe-like  pattern  of  the  mammalian  geniculocortical  projection^^  (see 
Figure  6(a)).  Thus,  segregation  of  eye-specific  inputs  can  occur  in  an  experimen¬ 
tally  manipulated  system  that  normally  never  forms  a  segregated  projection  and 
therefore  is  highly  unlikely  to  contain  intrinsic  eye-specific  labels  within  the  post- 
synaptic  target.  Moreover,  blockade  of  action  potential  activity  with  TTX  causes 
ganglion  cell  axons  from  the  two  eyes  to  desegregate®’^®’®®  (see  Figure  6(b)).  In 
amphibia,  connections  between  retina  and  tectum  continue  to  grow  throughout  lar¬ 
val  and  early  postmetamorphic  life,  in  a  process  involving  the  continual  reshaping 
of  synaptic  connections.*'*’'*®  These  experiments  suggest  an  analogous  conclusion, 
that  the  maintenance  of  segregated  inputs  in  these  three-eyed  frogs  is  a  dynamic 
ongoing  process  that  requires  neural  activity  (presumably  asychronous)  in  the  two 
eyes. 
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FIGURE  6  The  organization  of  ganglion  cell  axon  projections  to  a  dually  innervated 
optic  tectum  in  three-eyed  frogs,  as  revealed  in  tectal  wholemounts  by  injecting  one  of 
the  two  eyes  with  horseradish  peroxidase,  (a)  Axons  from  the  two  eyes  segregate  into 
alternating  stripes  reminiscent  of  the  system  of  mammalian  ocular  dominance  columns 
in  cortical  layer  4.  Desegregation  occurs  when  either  TTX  (not  shown)  or  APV  ((b) 
after  2.5  weeks;  (c)  after  4 -weeks  treatment)  is  infused  into  the  tectum.  Modified,  with 
permission,  from  Cline  et  al.^ 


CELLULAR  CORRELATES  OF  ACTIVITY-DEPENDENT 
COMPETITION 

The  finding  that  the  synchronous  activation  of  afferents  prevents  them  from  seg¬ 
regating,  while  cisynchronous  activation  promotes  segregation,  indicates  that  the 
timing  of  presynaptic  activity  is  cracial  to  the  process.  Studies  also  suggest  that 
involvement  of  the  postsynaptic  cell  is  necessary.  For  example,  in  the  mammalisui 
visual  cortex,  when  visual  stimulation  through  one  eye  is  paired  simultaneously  with 
postsynaptic  depolarization  produced  by  extracellular  stimulation,  the  strength  of 
inputs  from  the  stimulated  eye  can  be  enhanced  in  some  cells  from  minutes  to 
hours. The  effect  is  quite  variable  and  is  more  frequently  produced  in  young  ani¬ 
mals  during  the  critical  period  than  in  adults;  nevertheless,  this  experiment  serves 
to  illustrate  the  point  that  coincidence  of  pre-  with  postsynaptic  activity  can,  at 
least  under  certain  circumstances,  enhance  visually  driven  inputs. 

Manipulations  that  block  postsynaptic  activity  exclusively  can  also  alter  the 
outcome  of  competition  in  the  visual  system.  Reiter  and  Stryker^^  have  shown  that 
when  cortical  neurons  a^e  silenced  during  the  critical  period  by  the  intracortical 
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infusion  of  muscimol,  a  GABA-A  receptor  agonist,  monocular  eye  closure  has  sur¬ 
prising  consequences  for  the  inputs  from  the  two  eyes;  within  the  silenced  region  of 
cortex,  inputs  from  the  closed  eye  come  to  dominate  over  inputs  from  the  open  eye 
(see  Figure  5:  MD  and  muscimol),  wherezis,  of  course,  the  reverse  is  true  outside  the 
silenced  zone.  This  observation  shows  that  the  activity  of  postsynaptic  cortical  cells 
is  highly  likely  to  be  involved  in  the  synaptic  reorganization  occurring  during  the 
critical  period,  since  the  same  patterning  of  presynaptic  activity  produces  different 
outcomes  depending  on  the  state  of  activation  of  the  postsynaptic  cell. 

The  requirement  for  the  participation  of  both  pre-  and  postsynaptic  partners 
in  activity-dependent  rearrangements,  and  the  fact  that  coincident  activation  can 
strengthen  coactivated  inputs,  is  consistent  with  the  idea  that  a  Hebb  rule  may 
govern  the  process  of  synapse  rearrangements  during  ocular  dominance  column  de¬ 
velopment  in  mammals  (and  in  three-eyed  frogs)  (for  review  see  Brown  et  al.^; 
see  Kossel  et  al.^°  for  an  alternate  view).  Hebb^^  suggested  that  when  pre-  and 
postsynaptic  neurons  are  coactivated,  their  synaptic  connections  are  strengthened, 
whereas  connections  are  weakened  with  the  lack  of  coincident  activation.  In  this 
context,  the  muscimol  experiment  described  above^^  is  also  consistent  with  a  Hebb 
rule  in  the  sense  that  the  levels  of  presynaptic  activity  in  geniculocortical  axons  rep¬ 
resenting  the  closed  eye  are  better  matched  to  the  silenced  postsynaptic  neurons 
than  those  inputs  representing  the  open  eye.  Thus,  the  correlated  firing  of  neau-by 
ganglion  cells  within  one  eye,  and  the  lack  of  synchronous  firing  of  ganglion  cells  in 
the  other  eye  could  provide  appropriate  signals  to  produce  the  regional  strengthen¬ 
ing  and  weakening  of  synaptic  inputs  needed  for  segregation  to  take  place. 

These  activity-dependent  properties  of  visual  cortical  synapses  during  the  criti¬ 
cal  period  are  very  reminiscent  of  some  of  the  well-known  characteristics  of  synapses 
in  the  adult  mammalian  hippocampus  that  are  capable  of  undergoing  long-term  po¬ 
tentiation  (LTP):  that  is,  a  long  lasting  increase  in  synaptic  strength  produced  with 
the  appropriate  matching  of  pre-  and  postsynaptic  activation.^  "*®  In  the  CAl  region 
of  the  hippocampus,  many  lines  of  experimentation  indicate  that  activation  of  the 
NMDA  receptor  (N-methhyl-D-aspartate)  on  postsynaptic  neurons  by  mecuis  of  the 
presynaptic  release  of  glutamate  is  required  for  LTP.**®  The  consequent  strength¬ 
ening  of  synaptic  transmission  appears  to  be  due  at  least  in  part  to  a  presynaptic 
change;  an  increase  in  transmitter  ffelease  from  the  presynaptic  terminals. 

The  wealth  of  information  on  LTP,  and  its  similarities  with  activity-dependent  de¬ 
velopment,  has  prompted  many  recent  experiments  in  the  visual  s}  stem  designed  to 
learn  whether  the  two  forms  of  synaptic  change  share  simileur  cellular  mechanisms. 
Of  course,  it  should  be  noted  that  in  at  least  one  respect,  the  two  must  differ 
ultimately  in  that  in  development  major  structural  changes  occur  not  only  in  indi¬ 
vidual  synapses  but  also  in  the  overall  morphology  of  presynaptic  terminals,  since 
some  terminals  are  actually  eliminated  while  others  are  newly  formed.  Moreover, 
the  physiological  properties  of  developing  synapses  are  very  different  from  those  of 
adult,  suggesting  that  the  parameters  for  patjierned  activity  to  produce  synaptic 
change  may  also  differ. 

A  major  question  is  whether  NMDA-receptor  activation  is  necessary  for  de¬ 
velopmental  plasticity.  This  is  a  reasonable  question  to  pose  since  glutamate  is 
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thought  to  be  the  excitatory  neurotransmitter  released  by  retinal  ganglion  cells 
in  all  vertebrates^^'^^  and  also  by  LGN  neurons  in  mammals. The  most  com¬ 
pelling  evidence  in  favor  of  the  specific  involvement  of  NMDA-receptors  in  activity- 
dependent  development  comes  from  recent  studies  of  the  retinotectaJ  system  in 
fish  and  frogs.  For  instance,  in  three-eyed  frogs,  the  ocular  dominance  stripes 
desegregate  in  the  presence  of  the  NMDA  receptor  antagonist  APV  (2-amino-5- 
phosphonovaleric  acid),^  suggesting  that  activation  of  this  receptor  is  necessary  for 
the  maintenance  of  segregated  inputs  (see  Figures  6(b)  and  (c)). 

NMDA  receptor  activation  is  also  apparently  necessary  for  the  maintenance  of 
two  other  activity-dependent  processes  known  to  occur  in  the  retinotectal  system. 
The  first  is  in  the  refinement  of  topographic  projections  from  retina  to  tectum  that 
occurs  during  regeneration  of  the  optic  nerve  in  goldfish.  Ganglion  cell  axons  can 
establish  coarse  topographic  projections  even  when  activity  is  blocked  with  TTX, 
presumably  because  activity-independent  molecular  cues  are  unaltered. How¬ 
ever,  the  fine-tuning  of  axon  terminal  arbors  necessary  for  the  re-establishment  of 
highly  refined  connections  is  prevented.'*' In  this  case,  topographic  fine-tuning 
would  be  expected  to  occur  if  the  activity  of  neighboring  retinal  ganglion  cells  was 
highly  correlated,  while  that  of  distant  ganglion  cells  was  not — a  situation  naturally 
produced  with  visual  stimulation.  Consistent  with  this  suggestion,  rearing  animals 
in  stroboscopic  light,  whLoh  causes  all  ganglion  cells  to  fire  in  near  synchrony,  pre¬ 
vents  the  fine-tuning  of  topography  during  regeneration.®^  Recent  experiments  have 
demonstrated  that  infusion  of  APV  also  blocks  the  fine-tuning  of  the  retinotectal 
map.®*^  Moreover,  Schmidt  has  demonstrated  that  during  the  period  of  map  refine¬ 
ment  following  optic  nerve  regeneration,  low  frequency  electrical  stimulation  of  the 
optic  nerve  causes  long-term  potentiation  of  the  postsynaptic  tectal  response  which 
is  also  blocked  by  APV. 

Another  example  demonstrating  the  involvement  of  NMDA  receptors  involves 
the  process  by  which  binocular  neurons  are  normally  created  and  maintained  in 
the  frog  optic  tectum.  Although  the  optic  tectum  only  receives  direct  input  from 
the  retinal  ganglion  cells  in  the  opposite  eye,  an  indirect  pathway  from  one  tectum 
to  the  other  via  a  relay  nucleus,  the  isthmo-tectal  nucleus,  does  convey  input  from 
the  other  eye  to  create  binocular  neurons.  Here  too,  the  maintenance  of  the  binoc¬ 
ular  map  is  activity  dependent,  as  demonstrated  by  the  fact  that  rotation  of  one 
eye  in  its  orbit  leads  to  a  systematic  and  anatomically  demonstrable  re-wiring  of 
isthmo-tectal  connections  so  as  to  preserve  ocular  correspondence.'^®  The  rewiring 
induced  by  eye  rotation  is  prevented  by  infusion  of  APV  into  the  optic  tectum.®®  An 
essential  finding  in  these  studies  is  that  the  levels  of  APV  necessary  to  prevent  the 
activity-dependent  rearrangements  appcirently  do  not  block  appreciably  retinotec¬ 
tal  synaptic  transmission  or  the  excitability  of  the  postsynaptic  neuron.®’®*  Thus, 
the  APV  treatment  does  not  act  like  TTX  to  block  neural  activity  generally,  but 
more  likely  acts  specifically  to  prevent  whatever  cascade  of  events  is  triggered  by 
NMDA  receptor  activation. 

The  specific  involvement  of  the  NMDA  receptor  in  the  synaptic  alterations 
occurring  during  the  critical  period  in  the  mammalian  visual  cortex  is  more  contro¬ 
versial,  but  there  is  no  doubt  that  NMDA  receptors  are  present  throughout  (but  also 
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after)  the  relevant  times  in  the  cat  visual  system.  Physiological  studies  of  cortical 
neurons  demonstrate  that  both  their  spontaneous  firing  and  their  responses  to  vi¬ 
sual  stimulation  can  be  decreased  by  APV,  and  that  lower  doses  of  APV  are  needed 
in  younger  animals. Moreover,  Fox  et  al.’®  found  that  there  is  a  systematic 
change  in  the  laminar  distribution  of  responsiveness  to  iontophoretic  application 
of  APV  with  age;  in  neonates,  neurons  in  all  cortical  layers  are  sensitive  to  .\PV, 
whereas  uy  the  end  of  the  critical  period,  the  visually  evoked  responses  of  neurons  in 
the  deeper  cortical  layers  (layers  4,  .5,  and  6)  are  not  affected  by  APV  iontophoresis. 
Thus,  the  changing  susceptibility  of  cortical  neurons  in  layer  4  to  APV  application 
is  generally  correlated  with  the  period  in  which  segregation  of  the  geniculocortical 
afferents  occurs.  However,  the  fact  that  the  superficial  cortical  layers  remain  highly 
sensitive  to  NMDA  receptor  blockade  after  the  critical  period  draws  to  a  close  is 
difficult  to  reconcile  with  a  simple  view  for  the  participation  of  NMDA  receptors  in 
the  events  of  activity-dependent  segregation  and  visual  cortical  plasticity 

If  NMDA  receptors  are  to  contribute  to  the  mechanism  underlying  synaptic 
rearrangements  during  the  critical  period,  then  pharmacological  blockade  of  the 
receptor  might  be  expected  to  prevent  the  segregation  of  LGN  axons  into  ocular 
dominance  patches  within  layer  4  in  a  fashion  analogous  to  that  found  for  the  de¬ 
segregation  of  stripes  in  three-eyed  frogs.  At  present,  this  possibility  has  not  been 
investigated  in  the  mammalian  visual  system.  A  correlate,  that  receptor  blockade 
might  prevent  the  shift  in  ocular  dominance  toward  the  open  eye  caused  by  monoc¬ 
ular  eye  closure,  has  been  studied  by  using  minipumps  to  infuse  APV  into  the  cat 
visual  cortex  during  the  critical  period. Within  the  infusion  zone,  a  shift  to¬ 
wards  the  open  eye  was  prevented  and,  in  fact,  there  was  an  unanticipated  shift  in 
favor  of  the  closed  eye — -reminiscent  of  the  results  obtained  in  a  similar  experiment 
decribed  above  in  which  muscimol^^  was  infused  in  order  to  silence  selectively  the 
postsynaptic  cortical  neurons  without  also  blocking  presynaptic  afferent  inputs.  At 
first  glance,  then,  these  results  would  seem  to  conform  nicely  to  the  hypothesis 
that  NMDA  receptors  play  a  specific  role  in  activity-dependent  cortical  develop¬ 
ment  and  plasticity.  Unfortunately,  the  alternative  interpretation  exists,  namely 
that  APV  acts  in  a  nonselective  fashion  to  block  postsynaptic  activity,  much  as 
muscimol  does;  that  is,  current  flowing  through  an  NMDA-gated  channel  is  not 
exclusively  a  “plasticity”  signal.  Thig* alternate  interpretation  seems  quite  likely  in 
view  of  the  results  of  the  iontophoresis  experiments  described  above  demonstrat¬ 
ing  that  activation  of  NMDA  receptors  is  necessary  for  cortical  neurons  to  respond 
normally  to  visual  stimulation.  Thus,  at  present,  it  is  not  possible  to  draw  strict 
parallels  between  the  requirement  for  NMDA  receptor  activation  in  hippocampal 
LTP  and  an  analagous  role  in  visual  cortical  plasticity  during  development. 

Even  if  a  specific  role  for  the  NMDA  receptor  during  development  of  the  visual 
cortex  is  eventually  clearly  established,  the  synaptic  basis  for  its  mode  of  action 
remains  to  be  elucidated.  Clues  come  from  experiments  performed  on  rat  visual 
cortical  slices  in  vitro  which  suggest  that  synaptic  connections  can  undergo  LTP 
following  appropriate  tetanic  stimulation  of  the  white  matter  (which  contains  the 
incoming  LGN  axons),  both  during  neonatal  life  and  in  adulthood. LTP  is 
much  more  difficult  to  induce  in  cortical  neurons  than  in  the  hippocampus  (only 
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about  30%  of  all  recorded  neurons  demonstrate  the  phenomenon  in  cortex),  and.  in 
fact,  frequently  requires  a  concommitant  reduction  in  local  ahibitory  influences  by 
application  of  a  GABA  antagonist;  nevertheless,  as  in  the  hippocampus,  LTP  can  be 
blocked  consistently  by  iontophoresis  of  APV.'  However,  unlike  the  hippocampus, 
the  circuitry  of  the  cortex  makes  it  difficult  to  stimulate  an  isolated  e.xcitatory 
pathway  in  order  to  separate  monosynaptic  from  polysynaptic  inputs.  Thus,  wnile 
the  LTP  studied  in  hippocampus  clearly  involves  a  change  in  the  efficacy  of  a 
single  type  of  excitatory  synapse,  what  is  called  LTP  in  cortical  slices  may  involve 
■'  mixture  of  several  effects,  both  excitatory  and  inhibitory.  .Nevertheless,  these 
observations  raise  the  possibility  that  a  cascade  of  physiological  and  biochemical 
events  similar  to  those  known  to  occur  during  hippocampal  LTP  might  also  take 
place  during  activity-dependent  strengthening  of  visual  cortical  connections  during 
development. 

In  the  formation  of  ocular  dominance  columns,  both  normally  during  devel¬ 
opment  and  when  perturbed  by  abnorrr  !  visual  experience,  some  connections  are 
strengthened,  but  otiiers  must  be  weakened  and  likely  even  eliminated  in  order  for 
neurons  in  layer  4  to  become  monocularly  driven.  While  a  mechanism  such  as  LTP 
could  help  to  explain  synaptic  strengthening  in  the  visual  cortex,  what  about  the 
reverse?  A  recent  experiment  by  Artola  et  al.  ■  suggests  that  it  may  be  possible  to 
produce  a  weakening,  or  long-term  depression  (LTD),  of  synaptic  transmission  in 
neurons  in  slices  of  rat  visual  cortex.  These  authors  suggest  that  a  level  of  mem¬ 
brane  depolarization  above  resting  level  but  below  the  greater  level  required  for 
the  induction  of  LTP  can  produce  LTD  in  active  synapses.  (A  similar  phenomenon 
has  been  described  in  the  hippocampus  by  Stanton  and  Sejnowski^').  .Moreover, 
Artola  et  al.  report  that  LTD  can  be  produced  even  in  the  presence  of  AP\'.  con¬ 
sistent  wiui  the  idea  that  activation  of  an  NMDA-gated  channel  is  not  involved. 
This  result  may  help  to  explain  why  monocular  deprivation  combined  with  cor- 
tically  infused  muscimol^^  or  APV'*  causes  an  ocular  dominance  shift  in  favor  of 
the  closed  eye  within  the  infusion  zone.  Perhaps  in  the  presence  of  these  agents, 
activation  of  inputs  from  the  open  eye  brings  cortical  neurons  only  to  a  level  of 
membrane  potentiaJ  critical  for  LTD,  consequently  weakening  those  inputs.  While 
these  experiments  provide  a  convenient  conceptual  framework  for  thinking  about 
how  activity-dependent  synaptic  chrange  may  occur  during  visual  cortical  devel¬ 
opment,  it  will  be  essential  first  to  understand  these  effects  tn  vitro  at  the  level  of 
single  identified  synapses  and  next  to  demonstrate  that  similar  alterations  in  synap¬ 
tic  efficacy  indeed  take  place  m  vivo  during  the  critical  period  as  a  consequence  of 
natural  visual  stimulation. 
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GENERALITY  OF  ACTIVITY-DEPENDENT  DEVELOPMENT  IN 
THE  CENTRAL  NERVOUS  SYSTEM 

The  experiments  discussed  thus  far  provide  compelling  evidence  in  favor  of  the 
idea  that  activity-dependent  competitive  interactions  in  the  visual  system  can  ac¬ 
count  for  the  establishment  of  highly  segregated  and  topographically  ordered  sets 
of  connections  during  ocular  dominance  column  development  in  mammals,  and  in 
the  regeneration  and  maintenance  of  retinotectal  connections  in  lower  vertebrates. 
Moreover,  a  variety  of  new  experiments  has  begun  to  draw  exciting  parallels  be¬ 
tween  the  cellular  bases  for  these  events  and  those  thought  to  underlie  LTP  in  the 
hippocampus.  A  common  thread  in  all  these  examples  is  that  synaptic  change  can 
be  produced  by  the  appropriate  patterning  of  presynaptic  activity  and  its  conjunc¬ 
tion  with  postsynaptic  activity.  In  the  hippocampus,  when  these  requirements  are 
met,  evidence  suggests  that  the  resulting  alterations  may  subserve  memory  and 
learning.^  In  the  postnatal  visual  system,  they  subserve  synaptic  rearrangements 
that  are  generally  dependent  upon  visual  stimulation  in  order  to  provide  the  presy¬ 
naptic  correlations  in  neural  activity  necessary  to  preserve  topographic  relations, 
and  the  asynchrony  required  for  ocular  segregation. 

Studies  of  the  development  of  connections  between  retinal  ganglion  cells  and 
their  target  neurons  in  the'XGN  suggest  that  structured  activity  may  even  play  a 
role  long  before  vision  is  possible.  As  mentioned  early  in  this  article,  in  the  adult, 
ganglion  cell  axons  from  the  two  eyes  project  to  each  LGN,  where  they  terminate 
in  strictly  segregated  eye-specific  layers.  These  layers  are  not  present  initially  in 
development  but  rather  emerge  as  retinal  ganglion  cell  axons  from  the  two  eyes 
remodel  their  terminals®^  (see  Figure  2).  In  the  cat  and  monkey  visual  system, 
the  period  during  which  the  layers  form  is  entirely  prenatal.  It  begins  before  all 
photoreceptor  cells  become  postmitotic  and  is  complete  before  photoreceptor  outer 
segments  are  present. Nevertheless,  many  lines  of  evidence  suggest  that  here  too 
segregation  comes  about  by  a  process  of  activity-dependent  synaptic  competition. 
The  idea  that  competitive  interactions  of  some  form  might  govern  layer  formation 
originates  with  observe  tions  that  removal  of  one  eye  during  development  permits 
axons  from  the  other  eye  to  occupy  tfig  entire  LGN.'*'*  ®^  Hints  that  the  competition 
might  be  mediated  by  synaptic  interactions  comes  from  physiological  observations 
that  individual  LGN  neurons  initially  receive  binocular  inputs  when  the  optic  nerves 
are  electrically  stimulated  in  vttro^^  and  that  retinal  ganglion  cell  axons  from  one 
eye  can  make  synaptic  contacts  in  regions  later  exclusively  innervated  by  axons 
from  the  other  eye.**  ®^  These  observations  provide  evidence  to  suggest  that  synaptic 
remodelling  accompanies  the  formation  of  the  eye-specific  layers  in  the  LGN. 

What  might  be  the  source  of  activity-dependent  signals  during  these  early  times 
in  development  when  vision  is  not  possible?  The  most  likely  source  is  the  spon¬ 
taneously  generated  activity  of  retinal  ganglion  cells.  In  a  technically  remarkable 
experiment,  Galli  and  Maffei*^  succeeded  in  making  microelectrode  recordings  from 
fetal  rat  retinal  ganglion  cells  in  vivo  and  found  that  they  fired  spontaneously,  some¬ 
times  correlated  with  each  other  when  several  cells  were  recorded  together  on  the 
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same  electrode.^®  Recently  it  has  been  possible  to  examine  the  spatial  and  tempo¬ 
ral  pattern  of  firing  of  up  to  100  retinal  ganglion  cells  simultaneously  by  removing 
fetal  and  neonatal  retinae  and  recording  in  vitro  using  a  multielectrode  array:  re¬ 
sults  show  that  even  in  the  absence  of  photoreceptor  function,  ganglion  cells  fire  in  a 
very  stereotyped  bursting  pattern,  with  neighboring  cells  firing  in  near  synchrony.^® 
These  two  experiments  together  provide  evidence  that  the  spontaneous  activity  of 
retinal  ganglion  cells  may  have  the  appropriate  spatiotemporal  patterning  to  pro¬ 
vide  necessary  activity-dependent  cues  for  the  formation  of  topographically  ordered 
and  segregated  inputs  to  the  L(1N  and  other  central  visual  targets  of  ganglion  cell 
axons. 

If  spontaneous  activity  does  play  a  role  in  the  segregation  of  retinal  ganglion 
cell  axons  into  the  eye-specific  layers  within  the  LGN.  then  blockade  of  such  activity 
should  prevent  the  formation  of  the  layers.  Minipump  infusions  of  TTX  into  the 
thalamus  of  fetal  cats,  indeed,  block  layer  ft>rmation'^'^  and  correspondingly  perturb 
the  branching  pattern  of  individual  retinal  ganglion  cell  axons  so  that  branches  are 
no  longer  restricted  to  appropriate  zones  within  the  LGN.'°  Indeed,  the  effects  of 
TTX  on  the  shapes  of  retinal  ganglion  cell  axons  in  the  cat  are  remarkably  similar 
to  its  effects  on  ganglion  cell  a.xons  in  the  optic  tectum  of  three-eyed  frogs, as 
shown  in  Figure  7.  However,  a  criticism  of  the  results  is  that  TTX  may  have  acted 
in  a  non-specific  fashion-to  cause  unregulated  growth  of  the  axons. Definitive 
proof  that  this  is  not  the  case  requires  an  experiment  analogous  to  that  performed 
by  Stryker  and  Strickland, in  which  the  patterning  of  neural  activity  is  specifi¬ 
cally  perturbed.  This  should  be  possible  in  future,  when  the  mechanisms  for  the 
generation  of  synchronous  bursting  among  retinal  ganglion  cells  are  better  under¬ 
stood.  Meanwhile,  it  should  be  noted  that  ganglion  cell  axon  growth  is  not  entirely 
unregulated  in  the  presence  of  TTX:  the  axons  are  still  capable  of  detecting  and 
stopping  their  growth  at  the  LGN  boundaries. 

The  results  of  the  experiments  described  above  permit  an  important  generaliza¬ 
tion  concerning  the  universality  of  activity-dependent  synaptic  interactions.  During 
normal  development,  such  interactions  may  be  driven  not  only  by  the  normal  pat¬ 
tern  of  use  (e.g.,  visually  evoked  activity),  but  even  earlier  before  vision  begins  by 
patterned  spontaneously  generated  activity.  This  suggestion  raises  the  possiblility 
that  spontaneously  generated  activny  elsewhere  in  the  CNS  during  development 
may  play  a  similar  role  in  establishing  orderly  sets  of  connections.  If  so,  then  the 
synaptic  changes  produced  by  activity-dependent  interactions  early  in  development 
may  be  at  one  end  of  a  continuum  of  synaptic  change,  the  other  end  of  which  are  the 
use-dependent  alterations  in  synaptic  strength  associated  with  learning  and  mem¬ 
ory.  Although  the  changes  occurring  during  development  require  major  anatomical 
restructuring  of  axons,  whereas  those  occurring  during  learning  and  memory  are 
more  likely  to  be  confined  to  individual  synapses,^  evidence  presented  here  sug¬ 
gests  that  the  two  types  of  change  may  not  be  all  that  different  in  terms  of  cellular 
mechanisms.  Future  experiments  will  reveal  the  extent  to  which  the  two  areas  of 
investigation  converge,  and  whether  there  are  similarities  at  the  molecular  level 
as  well.  The  existence  of  similar  mechanisms  could  represent  an  extremely  elegant 
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solution  to  the  complex  problem  of  establishing  and  maintaining  specific  synaptic 
connections  throughout  life. 


(b) 


FIGURE  7  A  comparison  of  the  morphology  of  retinal  ganglion  cell  axons  in  the  fetal 
cat  at  E57  (a)  and  the  three-eyed  frog  (b)  following  TTX  treatment.  In  both  cases,  the 
terminal  atiwrs  of  the  axons  are  not  as  restricted  as  usual:  in  fetal  cats,  retinal  ganglion 
cell  axons  normally  have  terminal  arbors  that  branch  only  in  the  inner  or  outer  half  of 
the  LGN  rather  than  throughout  (compare  with  Figure  2).  In  three-eyed  frogs,  the  arbors 
are  usually  restricted  to  one  stripe  and  do  not  cross  stripe  boundaries  (indicated  by 
dashed  lines).  Adapted  from  Sretavan  et  al.,^°  and  Reh  and  Constantine-Paton.®° 
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Printed  with  permission  from  Variability  and  Motor  Control,  edited  by 
K.  M.  Newell  and  D.  Corcos.  Champaign:  Human  Kinetics,  1992. 

We  summarize  the  behavioral,  electrophysiological,  and  immunohistochem- 
ical  findings  in  the  sea  slug,  Pleurobranchaea,  and  compare  these  finding  to 
those  obtained  in  other  invertebrate  animals,  in  higher  animals,  and  in  hu¬ 
mans.  The  findings  show  that  there  is  "massive"  distribution  and  sharing 
of  information  occurring,  respectively,  through  diverging  and  converging 
network  connections. 

We  examine  the  findings  of  reductionist  approaches  and  find  them  inade¬ 
quate  to  answer  the  problems  arising  from  such  widely  distributed,  multi¬ 
functional,  and  highly  converging  networks  whose  activity  may  be  variable. 
Such  findings  indicate  that  “cooperative”  actions  among  groups  of  neurons 
may  arise  dynamically  and  nonlinearly  in  shifting  contexts  or  “consensuses'" 
of  response  in  which  individual  neurons  may  have  different  functions,  even 
during  times  when  the  behaviors  are  similar.  Control  of  these  systems  is 
emergent,  “fuzzy,”  and  error-prone  rather  than  being  reflexive  or  following 
explicit  causes  and  effects  that  can  be  reawl  from  the  “switchboard”  circuit 
of  the  connections  between  neurons. 
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A  unified  theoretical  perspective  is  needed  that  accounts  for  both  the  emer¬ 
gent  and  switch-board  systems.  Two  problems  apply  in  both  cases:  First, 
animals  may  have  evolved  highly  specialized  behaviors  whose  underlying 
neural  networks  may  not  necessarily  reflect  generally  applicable  principles. 
Second,  owing  to  their  complexity,  it  may  not  be  possible  to  character¬ 
ize  biological  networks  in  sufficient  detail  to  permit  an  understanding  of 
the  system  through  simulation  of  the  system  itself.  Thus,  we  use  biological 
information  only  as  indications  or  points  of  departure  to  identify  first  prin¬ 
ciples  that  are  not  initially  intended  to  account  for  a  particular  behavior, 
but  to  provide  insights  into  generally  applicable  self-organizing  processes  at 
the  local-neuron  level  that  can  then  be  used  to  understand  how  large-group 
action  emerges. 

We  discuss  a  number  of  these  avenues  to  examine  computationally  and  bio¬ 
logically,  e.g.,  (1)  error  and  variation  may  not  only  be  products  of  but  may 
be  causally  related  to  the  generation  system  dynamics.  (2)  The  possibil¬ 
ity  that  attractors  provide  avenues  for  energy  or  error  minimization  yields 
mechanisms  from  which  emerge  many  important  building  blocks,  e.g.,  the 
ability  of  groups  of  synapses  to  encode  different  categories  of  information 
simultaneously;  threshold  effects  that  enhance  system  function;  and  input 
signal  dynamics  which'not  only  carry  encoded  information  but  also  provide 
a  variety  of  search  strategies  for  locating  attractor  basins.  (3)  Minimal  net¬ 
work  architectures  may  be  identified  that  permit  bifurcation  into  different 
dynamical  states.  (4)  Computer  graphical  analysis  of  spatio-temporal  activ¬ 
ity  may  show  how  different  attractors  are  established  and  move  and  merge 
in  space  and  time.  (5)  Competition  between  synapses  may  continuously 
sculpt  and  readjust  network  connections  to  changing  conditions. 
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1.  INTRODUCTION:  GRAND  UNIFICATION  THEORIES 

Much  of  our  discussion  here  will  address  the  functional  meaning  of  divergence  and 
convergence  of  connections  among  neurons.  At  the  simplest  level,  both  are  anatomi¬ 
cally  definable;  divergence  occurs  when  a  single  neuron  sends  synapticid  projections 
to  many  other  neurons,  and  convergence  occurs  when  many  neurons  send  projec¬ 
tions  onto  a  common  follower  neuron.  A  more  functional  definition  is  to  say  that 
divergence  distributes  information,  whereas  convergence  produces  sharing  of  infor¬ 
mation.  The  consequence  of  divergence  is  to  increase  th.“  size  of  the  co-functional 
group  of  neurons,  but  this  alone  would  only  produce  a  set  of  independent  pro¬ 
cessors.  In  parallel  programming,  the  programmer  breaks  down  a  problem  into 
different  components  and  then  assigns  each  component  to  a  different  processor: 
the  programmer  distributes  the  components,  but  the  processors  act  independently. 
Similarly,  there  may  be  multiple  sites  of  learning,  perhaps  arising  from  divergence 
of  input-stimulus  pathways  onto  many  different  cells,  and  each  site  may  involve 
different  cellular  mechanisms,  but  unless  there  is  some  interaction  or  convergence, 
each  site  processes  information  independently.  Because  of  its  potential  for  sharing 
information,  convergence  forces  many  neural  sites  to  work  interdependently.  Thus, 
convergence  lies  at  the  heart  of  our  definition  of  parallel  processing  in  biological 
systems, as  it  does  in  simple  connectionist  neural  networks^**®  that  have  little 
resemblance  to  biological  ones. 

In  attempting  to  understand  the  functional  implications  of  divergence  and  con¬ 
vergence  even  in  small  networks,  Pribram 's^^^  analogy  to  holography  for  distributed 
memory  storage  seemed  a  possibility,*'*^  particularly,  as  Mpitsos  and  Cohan*"^®  later 
reported,  since  some  networks  are  able  to  reorganize  similar  motor  output  patterns 
of  activity  after  neurons  are  removed  that  appear  to  control  the  pattern  of  activ¬ 
ity  going  to  motor  neurons.  In  these  studies,  the  neuron  was  removed  from  taking 
part  in  the  motor  pattern  by  hyperpolarizing  it  below  its  firing  threshold.  This  pro¬ 
duced  two  types  of  errors:  cessation  of  firing  in  the  motor  neurons  that  it  controlled, 
and  cessation  of  all  motor  activity.  Eventually,  the  original  pattern  recovered  even 
though  the  hyperpolarized  neuron,  and  the  motor  neuron(s)  it  drove,  did  not  take 
part  in  the  reformed  motor  pattern.  Sjnce  the  overall  firing  pattern  in  the  reformed 
activity  in  the  motor  roots  appeared  similar  to  the  original  pattern,  it  seems  rea¬ 
sonable  that  the  error  was  somehow  distributed  throughout  the  generator  network. 
By  analogy  to  holography,  the  “picture”  of  activity  emerging  from  the  memory  dis¬ 
tributed  among  the  pattern-generating  neurons  exhibited  graininess  when  bits  of 
information  were  lost  rather  than  exhibiting  holes  or  gaps  in  some  regions  while  re¬ 
taining  high  resolution  in  others  as  would  occur  in  some  neural  networks.’**®  VVe  use 
“graininess”  here  because  fewer  neurons  became  involved  in  the  reformed  pattern 
than  in  the  original  one,  yet  the  overall  structure  of  the  pattern  seemed  the  same. 


[ilWe  use  the  terms  “synaptic  projections”  and  “connections”  to  refer  both  to  well-defined  pre-  and 
postsynaptic  structures  involving  localized  transmitter  release  and  to  morphologically  indistinct 
structure  involving  diffuse  transmitter  release. 
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There  are  problems  even  with  the  notion  of  holography,  and  in  carrying  the  anal¬ 
ogy  of  graininess  too  far,  but  for  the  present  purposes,  the  real  question  that  these 
studies  point  to  is  one  of  memory  storage  and  control  in  high-dimensional  systems. 
The  high  dimensionality  that  we  refer  to  is  not  just  in  the  number  of  interacting 
components.  It  also  includes,  as  we  shall  discuss,  the  storage  of  different  forms  of 
information  within  the  same  set  of  synapses  and  nonlinear  ways  of  addressing  it. 

While  it  is  easy  to  see  high  dimensionality,  and  the  consequences  of  it,  in  the 
human  cortex,  it  has  not  been  so  easy  to  admit  that  it  exists  in  animals  that 
neuroscience  persists  in  calling  “simple.”  A  world  view  that  polarizes  animals  into 
simple  and  complex  (into  generalizations  relating  to  invertebrate  and  vertebrate 
phyla)  emerged;  e  g.,  see  comment  in  Edelman.^^  A  wide  variety  of  factors,  includ¬ 
ing  the  technology  of  intracellular  microelectrode  recordings,* the  ability  to  use 
these  recording  methods  on  cells  that  r-  be  identified  in  different  experimental 
preparations,  findings  showing  that  act  ly  is  encoded  within  the  central  nervous 
system  itself  for  generating  patterned  motor  activity,*^®  the  importation  of  the 
ethologist’s'*'*  fixed-action  pattern  (FAP),  identification  of  functional  types  of  cells 
such  as  command  neurons  that  control  central  pattern  generators  and  stereotyped 
behaviors  or  FAPs,'*®  *'*'*  and  the  related  findings  showing  that  much  of  this  activity 
is  genetically  encoded,**^  worked  together  to  entrench  reductionism.  Though  each 
finding  remains  useful  in  hts  own  right,  concepts  developed  from  reductionist  single- 
neuron  methods  have  proved  inadequate  to  understand  distributed,  multifunctional, 
and  variable  systems. 

It  is  an  interesting  discovery  that  many  biological  systems,  being  potentially 
high  dimensional,  may  generate  complex  behavior  that  is  governed  by  relatively 
low-dimensional  dynamics. PI  Choatic  systems  fall  into  this  category,  and,  because 
of  their  complex  response  dynamics,  have  been  a  subject  of  considerable  attention 
over  the  past  ten  years.*'*'*  ’^■*  *^^'*^^  We  shall  sunmiarize  some  of  these  efforts.  But 
rather  than  dealing  with  the  verifiability  of  chaos  itself  or  of  any  dynamic  process, 
which  has  already  been  addressed  sufficiently  elsewhere, *^^  what  w,;  vvish  to  do 
here  is  to  address  common  features  of  ali  nervous  systems  which  give  rise  to  or 
exclude  the  ability  of  the  systems  to  produce  particular  response  dynamics.  This  is 
to  say  that  the  important  features  are  not  so  much  whether  repetitive  activity,  as 
one  example,  is  generated  by  limit-cyde  or  chaotic  dynamics,  as  it  i  of  the  system 
characteristics  that  permit  different  activities  to  arise. 

It  may  be  useful  to  forewarn  the  reader  that  our  own  perspective  of  brain  func¬ 
tion,  or  of  the  function  of  systems  composed  of  aggregates  of  nonlinearly  interacting 
components,  has  two  parts,  one  experimental  and  the  other  philosophical.  It  is  essen¬ 
tial,  of  course,  t  hat  the  philosophy  or  theory  one  holds  about  the  actions  of  a  system 
must  have  a  foundation  on  hard  biological  fact.  However,  problems  arise  when  doing 
only  that.  Take  just  one  example:  All  visual  systems  use  on-responses  to  respond  to 


PI  There  is  often  no  need  to  go  beyon  '■  its  definition  of  dynamics  simply  as  “time-dependent 
variations  of  activity,"  tliough  there  are  different  forms  of  dynamics.  Rather  than  presenting  » 
formal  definition,  we  shall  introduce  various  ideas  that  modify  our  standard  working  definition  as 
they  arise  in  the  course  of  the  discussion. 
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the  onset  of  light,  and  off-responses  to  respond  to  the  off-set  of  light.  But  knowing 
the  cellular  and  physiological  mechanisms  that  generate  off-responses  in  some  mol¬ 
luscs  would  lead  one  completely  tusiray  about  the  mechanisms  that  produce  them 
in  vertebrate  animals. Kvolutionary  selection  mechanisms  tend  to  optimize 
the  adaptivel^l  mechanisms  in  each  organism.  Ihus.  owing  to  diversification  and 
optimization,  it  is  often  difficult  to  determine  what  features  permit  generalization 
across  organisms,  <ir  for  that  matter,  .across  integrative  systems  within  an  organism 
because  the  various  .systems  m.ay  have  d<'veloped  under  different  evolutionary  con¬ 
straints.  It  is  possible  to  argu('  in  favor  of  comments  one  might  find  in  print,  which 
go  something  like  this:  Owing  to  the  observation  that  evolution  conserves  mecha¬ 
nisms,  what  we  understand  of  tiu'chanisms  of  h'arning  in  a  simple  animal  such  as 
a  sea  slug  will  allow  us  to  understand  the  mechanism  of  learning  in  humans.  But 
to  take  that  argument  is  to  forget  the  e(]uaily  important  fact  that  diversification  is 
a  crucially  important  driving  forc(>  m  biologic.al  evolution,  not  only  through  varia¬ 
tions  arising  from  random  factors,  but  also  through  (h'terministic  low-dimensional 
factors  whose  dynamics  gives  them  a  life  of  their  own. 

As  neurobiologists,  we  are  interested  in  the  integrative  mechanisms  of  sea  slugs, 
crayfish,  insects,  leeches,  lamprcws.  or  humans.  But  from  a  broader  perspective,  "e 
wish  to  ask  whether  there  are  scale-independent  principles,  namely,  ones  that  apply 
to  different  levels  of  organization,  from  chemical  processes  to  cellular,  organismal. 
and  social  ones.  The  (juestion  is:  Can  we  identify  unifying  principles,  as  one  might 
say  of  the  attempts  to  ('stablish  grand  unification  theories  (GUTs)  in  physics? 
Unfortunately  biological  systems  are  too  complex  and  uncontrollable  to  permit  such 
a  synthe.«is  presently,  .us  we  shall  try  to  show  in  the  present  paper.  One  possibility 
is  to  conduct  computer  simulations  of  models  that  reduce  a  particular  biological 
.system  within  the  bounds  of  (h'finable  characteristics.  While  this  may  give  insight 
into  mechanisms  pertaining  to  that  system,  it  may  not  provide  much  insight  into 
general  principles. 

An  alternative  simulation  approach  is  to  u.se  biological  information  as  "points 
of  departure"  to  conduct  computer  simulations  that  do  not  necessarily  attempt  to 
replicate  the  structure  or  function  of  any  particular  biological  system.  We  go  fur¬ 
ther  to  suggest  that  It  might  tie  useful  to  use  simulation  systems  that  are  actually 
extreme  caricatures  of  biology,  but  which  nonetheless  might  generally  give  insight 
into  biology.  Uventually,  what  we  hope  to  do  is  to  obtain  some  idea  about  how  net¬ 
work  architecture  incorporates  various  linear  and  nonlinear  interactions  between 
neurons  to  allow  the  network,  .is  a  whole,  to  generate  different  types  of  response 
dynamics.  We  want  also  to  understand  how  these  fundamental  network  principles 

I  I  Fhi’  term  'Vui<iptiv<’ '  implies  si>nie  <  .)nforni;iti<>n  of  .1  system  (l)iolo|qral  or  computational)  th.at 
allows  it  to  survive  in  its  environment.  I  he  process  of  conforming,  as  we  shall  (iisciLss  in  detail  m 
.Section  7.  may  represent  a  gradient  descent  in  the  error  of  the  response  with  respect  to  the  response 
required  for  survival,  or  in  the  energy  required  to  generate  the  response.  That  there  may  be  local 
minima  in  such  conformations  indicates  that  there  m.ay  be  non-optimal  ways  of  responding,  .ind. 
conversely,  it  indic.ates  that  there  may  .also  be  an  absolute  minimum  representing  some  optimal 
way  that  the  system  might  respond  for  a  given  environmental  demand,  though  local  imnima  may¬ 
be  sufficient  for  survival. 
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become  sculpted  selectively  to  produce  the  neural  responses  observed  in  individual 
animals.  The  neural  architecture  in  individual  organisms  may  retain  more  or  less 
of  these  primal  features,  as  required  or  permitted  by  the  tasks  presented  for  adap¬ 
tive  fitness.  Thus,  by  seeking  to  identify  common  principles  from  which  different 
mechanisms  may  emerge,  we  are  joining  a  call  to  reconsider  the  importance  of  com¬ 
parative  biology,"^  a  subject  which  has  suffered  as  research  has  become  entrenched 
in  animal-specific  encampments.  But,  as  we  hope  will  become  apparent,  our  efforts 
will  not  be  to  determine,  for  example,  whether  command  processes  are  the  same 
in  different  animals  or  to  define  the  command  process  more  exactly.  As  important 
as  such  issues  are,  we  shall  nonetheless  aim  to  address  comparisons  at  a  broader  or 
more  abstract  level.  Much  of  our  discussion  here  will  center  on  making  analogies 
through  commonality  in  dynamical  principles  rather  than  in  mechanisms. 

There  are,  of  course,  many  people  who,  in  one  way  or  another,  have  addressed 
the  question  of  how  cooperative  action  arises  among  groups  of  intercommunicat¬ 
ing  individuals.  The  works  of  Grossberg,  for  example,  on  neural  networks  cind 
the  mathematical  foundation  of  many  of  psychological  phenomena  are  too  numer¬ 
ous  even  to  summarize  adequately.^^’^^  It  is  a  theme  of  modern  neural  network 
connectionism,*®^  in  studies  of  chemical  dynamics, and  in  manrunalian  ner¬ 
vous  system. In  many  biological  aspects,  it  can  be  traced  back  to  Darwin,'*^ 
and  to  Aristotle. Such-works  notwithstanding,  we  shall  attempt  to  show  in  the 
present  discussion  that  a  unifying  theory  of  how  neurons  (or  individuals  of  any 
type)  act  cooperatively  within  a  group  is  presently  lacking.  Along  the  way  we  shall 
also  attempt  to  identify  ways  for  continuing  the  search  for  unifying  principles. 

In  the  course  of  this  paper  we  shall  first  describe  the  behavioral,  physiolog¬ 
ical,  and  immunohistochemical  studies  in  our  experimental  system  the  sea  slug 
Pleurobranchaea,  and  then  compare  these  results  to  those  obtained  in  other  in¬ 
vertebrate  animals  and  in  vertebrates.  Another  gastropod  mollusc,  the  sea  slug 
Aplysta,  has  been  the  focus  of  reductionist  researches  in  many  laboratories  that 
have  attempted  to  explain  animal  behavior  and  associative  learning  in  terms  of  de¬ 
finable  reflexes.  Section  6  deals  with  reductionism;  we  examine  these  findings,  show 
the  difficulties  that  have  arisen,  and  then  reassess  them  from  the  point  of  view 
of  parallel-distributed  processing.  Given  growing  interest  in  nonlinear  dynamics  in 
model  mathematical  and  physical  m&dels,  we  examine  the  viability  of  applying  tools 
arising  from  these  studies  to  biological  systems.  In  Section  8,  we  suggest  computer 
methods  which  might  give  some  insight  into  how  the  integrated  activity  of  large 
numbers  of  neurons  might  arise  from  interactions  occurring  locally  between  indi¬ 
vidual  neurons.  Thanks  to  the  work  of  Rene  Thom,^®^  we  use  a  call  from  Aristotle' 
to  summarize  the  intent  of  our  own  work  begun  two  decades  ago:  “'MAArji;  ap^ijv 
ap^oficvoi”  namely,  “Now  let  us  make  a  fresh  start,”  at  least  to  point  out  what  it 
is  that  traditional  thinking  in  neurobiology  does  not  address  sufficiently,  and  what 
the  problems  are  in  progressing  further. 
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2.  FINDINGS  IN  A  SEA  SLUG 

2.1  BEHAVIOR 

Pleurobranchaea  is  a  large  sea  sing,  a  member  of  the  opisthobranch  gastropod  mol¬ 
luscs.  ranging  in  size  from  a  few  millimeters  to  tens  of  centimeters,  depending  on 
its  age.  Its  general  body  features  resemble  a  snail,  though  like  land  slugs,  it  has 
no  shell  (see  photographs  in  .Mpitsos'’’*  ).  The  animal  exhibits  a  relatively 

large  repertoire  of  behaviors, l‘‘l  including,  righting  when  turned  upside  down,  defen¬ 
sive  withdrawal,  mating,  egg-laying,  feeding  and  a  variety  of  other  mouth-related 
behaviors  involving  the  mouth,  lips,  jaws,  and  radula  (a  structure  analogous  to 
a  tongue).  Feeding  behavior  usually  has  dominance  over  the  other  behaviors.  For 
example,  animals  normally  withdraw  from  tactile  stimuli  applied  to  their  head  re¬ 
gions.  but  in  the  pre.sence  of  food,  withdrawal  responses  are  suppressed  in  feeding- 
motivated  animals.*'’  ’  T  he  most  obvious  feature  of  the  feeding  behavior  is  the 
rapid  bite-strike  response  in  which  the  entire  jaw  structures  comprising  the  pro¬ 
boscis  are  rapidly  thrust  out  to  bite  at  a  food  object  and  then  rapidly  withdrawn. 
Feeding  also  consists  of  bite-ingestion  movements  in  which  food  is  grasped  and  then 
.sequentially  drawn  into  the  mouth  cavity  largely  through  cyclical  inward  and  out¬ 
ward  movements  of  the  radula  and  coordinated  movements  of  the  anterior  regions 
of  the  jaws  and  mouth.  .4  third  stage  of  feeding  consists  of  swallowing  movements 
in  which  food  is  passed  from  the  buccal  cavity  through  the  esophagus  and  then  into 
the  stomach.  The  bite- ingest  ion  and  swallow  components  of  feeding^'*  are  excel¬ 
lent  for  neurophysiological  work  because  of  their  oscillatory  characteristics,  much 
as  might  happen  in  humans  during  opening  and  closing  of  the  jaws  and  related 
movements  of  the  tongue.  Because  of  the  sequence  of  oscillations,  the  behavior  per¬ 
sists  and  is  amenable  to  analysis,  whereas  single-shot  behaviors  such  as  withdrawal 
are  more  difficult  to  analyze.  However,  as  in  humans,  the  number  of  cycles  that 
the  animal  may  exhibit  during  a  single  bout  of  bite-ingestion  and  swallow  is  often 
short  and  possibly  nonstationary  in  its  temporal  characteristic,  which,  as  discussed 
below,  pose  difficult  probletns  in  studies  aimed  at  understanding  the  dynamics  of 
the  behavior. 

The  jaws,  radula,  mouth,  and  lips  of  the  animal  generate  many  different  and 
variable  behaviors.*'^  These  include  several  components  of  feeding,  regurgitation, 
defensive  biting,  among  others.^"*  *■'*  The  animal  also  exhibits  self-  and 

inter-animal  gill  grooming, but  we  presently  have  no  way  to  evoke  gill-grooming 
behavior  reliably.  However,  of  all  its  behaviors,  inter-animal  gill-grooming  is  par¬ 
ticularly  interesting  because  Pleurobranchaea  is  cannibalistic,  raising  questions  into 


biThe  ensuing  discussion  also  relies  on  the  term  "behavior,”  and  identifies  a  number  of  behaviors 
within  the  repertoire  of  what  the  animal  can  do.  For  the  moment,  we  use  "behavior"  to  refer 
specifically  to  a  definable  response  of  the  animal,  or  generically  to  some  unspecified  but  potentially 
identifiable  response.  VVe  shtill  see  by  subsection  3.7,  however,  that  the  definition  of  behavior,  of 
behavioral  repertoire,  and  of  behaviorally  multibehavioral  or  multifunctional  systems  (ones  that 
can  produce  different  behaviors  using  the  same  sets  of  neurons)  needs  to  be  revised  to  take  into 
account  the  consequences  of  variation  in  “contexts"  of  neuronal  group  action. 
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the  mechanisms  that  turn  carnivorous  feeding  mouth,  radula,  and  jaw  movements 
into  cleaning  movements. 


2.2  NEUROPHYSIOLOGY 

2.2.1  KEY  FEATURES  OF  ALL  MOUTH-RELATED  BEHAVIORS  CAN  BE  EXAMINED 
THROUGH  A  SMALL  POPULATION  OF  NEURONS.  THE  BCNS  The  cerebropleural 
ganglion  (“brain”)  of  Pleurobranchaea  innervates  the  mouth  and  anterior  head  re¬ 
gions,  whereas  the  buccal  ganglion  innervates  the  muscles  that  move  the  jaws  and 
radula.  Thus,  coordination  of  buccal-oral  behaviors,  namely  ones  that  involve  both 
the  buccal  structures  and  the  mouth  and  lips,  must  happen  through  these  ganglia. 

The  only  way  this  can  happen  is  through  the  buccal-cerebral  neurons  (BCNs), 
of  which  there  are  approximately  15-20  in  each  half  of  the  two  buccal  hemiganglia. 
The  BCNs  are  unique  because  they  are:  (1)  the  only  cells  in  the  buccal  ganglion 
that  project  to  the  brain,  except  for  two  bilaterally  paired  giant  neurons  whose 
function  is  presently  unknown,  and  (2)  that  are  either  directly  involved  in  generating 
the  central  pattern  generator  for  the  buccal  behaviors  or  intimately  involved  in 
controlling  There  may  be  other  oscillators  located  in  the  brain,  but  by 

comparison  to  the  effect  of  the  BCN  oscillator,  other  oscillators  have  weak  effects. 
The  BCNs  and  the  two  giant  cells  are  the  only  sources  of  information  to  the  brain 
about  processes  in  the  buccal  ganglion.  All  of  the  behaviors  involving  movements 
of  the  mouth  and  lips  in  coordination  with  the  tongue  and  jaws  must  act  through 
BCNs,  and  since  the  BCNs  are  part  of  the  central  pattern  generator,  they  do  mote 
than  perform  coordination  of  the  different  motor  centers. 

’though  the  various  mouth-related  behaviors  may  involve  thousands  of  neu- 
iis,  key  features  of  the  information  required  to  generate  these  behaviors  may  be 
obtained  from  much  smaller  subsets  of  neurons  consisting  primarily  of  the  BCNs 
and  some  of  the  neurons  with  which  they  interconnect.  Thus,  the  BCNs  acting  in¬ 
dividually  and  as  a  group  are  multifunctional  because  they  must  generate  activity 
pertaining  to  multiple  behaviors. 

2.2.2  CONNECTIVITY  OF  THE  BCNS  Figure  1  summarizes  the  BCN  connections. 
The  evidence  for  these  connections  has  been  described  in  several  publica- 
tions.^^’^^’^'*d39  present  evidence  indicates  that  they  connect  with  one  another 
primarily  polysynaptically,  as  indicated  by  the  interneurons  in  Figure  1;  however, 
many  of  these  polysynaptic  connections  may  be  through  other  BCNs.  In  a  few  cases 
there  may  be  mutual  inhibitory  connections  between  the  BCNs,  but  the  exact  con 
nectivity,  if  it  can  be  defined,  remains  for  further  study.  As  indicated  schematically 
in  Figure  1,  many  BCNs  converge  onto  the  same  target  motor  neurons,  and  in¬ 
dividual  BCNs  diverge  onto  different  motor  neurons.  In  turn,  the  motor  neurons 
neurons  feed  back  to  the  BCNs  that  drive  them.  An  identified  group  of  neurons 
in  the  brain,  the  paracerebral  neurons  (PCNs),  converge  onto  the  BCNs,  and  the 
BCNs  feed  back  to  the  PCNs.®®  ®®’^^^ 
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FIGURE  1  Cartoon  showing  central  features  of  converging  and  diverging  connections 
in  Pleurobranchaea  nervous  system.  BCN;  buccal-cerebral  neurons.  I;  interneuron.  M; 
Motor  neuron.  PCN:  Paracerebral  neuron.  Size  of  each  of  these  pools  of  neurons  is 
about  10  to  20  units  each.  There  are  many  more  motor  neuron  pools,  one  for  (cont’d.) 
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FIGURE  1  (cont’d.)  for  each  motor  root;  some  cells  send  axons  out  multiple  roots.  R1 : 
motor  root  that  innervates  muscles  for  opening  jaws.  R3:  motor  root  for  closing  jaws. 
Motor  roots  of  brain  are  not  shown.  For  clarity  of  presentation,  the  BCN-motor  neuron 
connections  are  shown  on  the  left,  and  BCN-PCN  connections  are  shown  on  the  right. 
Reprinted  with  permission  from  Brain  Res.  Bull.  21  (1988):  529-538. 


The  actual  biological  network  is  much  larger  and  more  interconnected  than 
shown  in  Figure  1.  For  example,  there  are  different  pools  of  neurons  that  send  axons 
out  of  the  brain  through  the  various  motor  roots,  of  which  there  are  approximately 
a  dozen  on  each  side  of  the  brain,  though  some  motor  neurons  send  axons  out 
different  roots.  Additionally,  it  is  necessary  to  consider  that  there  are  numerous 
pools  of  interneurons.  Thus,  the  number  of  converging  and  diverging  connections  in 
the  brain  and  buccal  ganglion  is  quite  large.  Moreover,  just  as  there  are  interactions 
between  the  brain  and  buccal  ganglion,  there  are  interconnections  between  the  brain 
and  other  ganglia.  Therefore,  the  extended  network  consisting  of  neurons  affecting 
the  BCNs,  and  ones  that  the  BCNs  affect,  involves  hundreds  of  neurons. 

What  we  hope  to  achieve  in  our  present  line  of  work  is  to  add  neuron  pools 
to  the  core  model  shown  in  Figure  1.  We  want  especially  to  obtain  the  temporal 
relationships  in  the  firing  of  as  many  of  the  neurons  as  possible,  partly  to  use 
the  data  to  reassess  the  conclusions  we  have  already  reached,  and  partly  to  use  it 
to  obtain  some  insight  into  how  such  large  numbers  of  neurons  interact  with  one 
another.  The  time  of  firing  of  all  BCNs  and  PCNs  is  being  extracted  from  multiple 
recordings  conducted  simultaneously  at  different  extracellular  sites  along  the  nerves 
that  connect  the  brain  and  buccal  ganglia  (the  cerebro-buccal  connectives,  CBCs). 
Since  activity  occurs  in  both  directions  in  the  CBC,  the  multiple  recording  sites 
allows  us  to  determine  the  direction  of  propagation  of  firing  in  different  nerve  fibers, 
and  thereby  to  distinguish  betv^een  the  BCNs  and  other  neurons.  It  is  only  a  matter 
of  extended  labor  to  include  the  time  of  firing  of  motor  neurons  in  the  different 
motor  roots. 

The  point  of  all  of  this  work,  however,  is  not  to  obtain  a  complete  network, 
but  to  use  the  data  to  assure  that  our  computer  simulations  of  different  model 
assumptions  will  provide  activity  tlTat  reflects  the  activity  in  the  biological  system. 
A  particularly  important  aspect  of  this  work  will  be  to  obtain  an  indication  of  the 
types  of  variations  and  motor  pattern  blending  that  the  system  generates. 

Owing  to  similarities  in  their  gross  neuroanatomiccil  features,  which  distribute 
differ'’nt  functions  to  the  buccal  ganglion  and  to  the  brain,  the  principles  obtained 
in  Pleurobranchaea  may  hold  in  many  other  snails  and  slugs.  Moreover,  it  is  likely, 
though  not  demonstrated  sufficiently,  that  neurons  analogous  to  the  BCNs  in  Pleu- 
robranchaea  may  have  similar  functions  in  all  snails  and  slugs.  But  it  is  not  clear 
presently  whether  other  snails  euid  slugs  generate  as  many  mouth-related  behav¬ 
iors  as  Pleurobranchaea,  and  whether  the  behaviors  in  these  other  animals  are  as 
variable. 
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3.  DISTRIBUTED  FUNCTION,  MULTIFUNCTIONALITY,  AND 
VARIATION 

3.1  RATIONALE  FOR  CHANGE  IN  CONCEPTUAL  FRAMEWORK;  SINGLE 
CELLS  TO  CONTEXTUAL  GROUPS 

Our  initial  aim  for  studying  this  “simple”  sea  slug  was  to  understand  the  cellu¬ 
lar  basis  of  learning.  The  many  control  experiments  in  the  studies  of  Mpitsos  and 
Collins^"'*  and  Mpitsos.  Collins,  and  McClellan were  the  first  to  demonstrate  that 
sea  slugs  are  capable  of  Pavlovian  and  avoidance  associative  learning,  and  even  ear¬ 
lier  work,  though  not  as  extensively  controlled,  promised  that  associative  learning 
could  be  examined  in  isolated  nervous  systems.*"*®  However,  work  begun  in  the  mid 
1970s  closely  examined  the  motor  patterns  and  behaviors,  and  showed  that  net¬ 
works  are  multifunctional  in  being  capable  not  only  of  generating  different  behaviors 
and  that  similar  motor  patterns  can  yield  different  behaviors.®'*  *^*'*^®’*^®  *^^’*®^’*®^ 
More  importantly  the  motor  patterns  of  different  behaviors  often  blend  with  one 
another  and  the  underlying  motor  patterns  of  neural  and  muscular  activity  are 
quite  variable®^’ As  discussed  below,  rather  than  a  definable  reflex  system,  it 
seemed  possible  that  networks  of  neurons  work  by  flexible  contexts  of  action.  The 
variations  in  the  contexts  might  involve  linear  regroupings  or  might  arise  from  non- 
linearities  that  cause  rapid  shifts  or  bifurcations  in  the  patterns  of  activity  generated 
by  the  network.  It  became  apparent  that  attempts  to  attribute  specific  function  to  a 
given  neuron,  or  to  locate  the  engram  of  a  learned  behavior  to  a  particular  synapse 
could  fail. 

Consequently,  we  had  to  backtrack,  to  reassess  how  it  is  that  even  innate  or 
“unlearned”  motor  patterns  arise  in  such  systems  before  we  could  address  the  prob¬ 
lem  of  how  newly  learned  information  is  incorporated  into  the  network.  Although 
we  continued  to  conduct  learning  studies  after  the  observations  made  in  the  mid  to 
late  1970s,  our  rationale  for  doing  them  heis  not  been  to  find  the  locus  of  learning 
at  specific  synapses,  but  to  determine  whether  learning  could  actually  be  identified 
in  the  responses  of  reduced  preparations. Additionally,  given  the  indica¬ 
tion  that  information  may  be  distributed  over  many  neurons  it  was  necessMy  to 
develop  the  technology  for  identifying  populations  of  neurons  that  are  involved  in 
specific  aspects  of  learning  among  which  we  could  examine  how  learning  affected 
cooperative  actions  among  neurons  in  the  population. *'*^’*®®’*®*’*^* 

The  idea  of  cooperativity,  which  Freeman  and  coworkers*  have  used  to  ad¬ 
vantage  in  their  studies  of  rabbit  olfactory  bulb,  resembles  what  we  refer  to  as 
''contexts’'  in  neuronal  group  function.  Much  of  the  discussion  in  this  paper  will 
attempt  to  present  our  understanding  of  functional  contexts.  Early  in  the  devel¬ 
opment  of  the  idea  of  command  neurons  (cells  that  evoke  stereotypic  behaviors), 
Davis  and  Kennedy*^  "*^  showed  that  each  command  neuron  of  the  lobster  swim- 
meret  system  produces  characteristically  different  effects  and  selectively  controls 
different  motor  neurons,  indicating  that  the  command  process  arises  from  group 
action  in  which  each  command  neuron  performs  specific  subtasks  of  the  command 


80 


George  J.  Mpitsos  and  Seppo  Soinila 


process  and  activates  a  particular  set  of  motor  neurons.  Later  work,  such  as  the  find¬ 
ing  in  Pleurobranchaea  that  command  neurons  receive  feedback  connections  from 
the  motor  network  that  they  drive, blurred  functional  distinctions  that  may  be 
attributed  to  single  neurons  because  function  seemed  to  be  shared.  Davis'^*  used  the 
term  ‘consensus”  to  refer  to  the  emergent  actions  that  might  arise  among  groups 
of  interacting  neurons.  In  studies  on  locust  walking,  Kien^°'®^  used  “consensus"  to 
refer  to  variable  activity  in  ensembles  of  neurons.  Our  thinking  on  the  ability  of 
groups  of  neurons  to  act  contextually  includes  variation  in  the  effects  produced  by 
individual  neurons,  by  the  group  as  a  whole,  and  in  the  neurons  that  constitute 
the  group.  For  the  present  discussion  we  use  the  idea  of  ‘‘contexts”  interchangeably 
with  “consensus,”  partly  because  we,  too.  are  inclined  to  believe  that  its  meaning  of 
"all  or  most”  is  descriptive  of  what  may  often  take  place  in  the  number  of  neurons 
that  become  active  during  normal  behavior. 

Although  there  are  similarities  between  our  use  of  the  “contexts/consensus" 
and  Davis’  and  Kien's  use  of  “consensus.”  there  are  also  some  important  differ¬ 
ences  which  we  shall  address.  Our  definition  relies  on  many  factors  other  than  the 
number  of  neurons  that  become  active.  Therefore,  we  hold  off  a  definition,  which  is 
given  in  subsection  3.8,  until  have  first  presented  behavioral  examples,  and  provided 
discussions  of  principles  relating  to  variation,  dynamics,  and  nonlinear  function. 


3.2  CONTEXT  OF  NEURONAL  GROUP  ACTION:  INFERENCES  FROM 
BEHAVIORAL  CHOICE 

The  following  example  may  help  to  explain  our  use  of  the  term  ‘‘consensus"  (or 
“context”);  One  of  the  original  purposes  for  studying  Pleurobranchaea  was  to  ex¬ 
amine  how  animals  “choose”  to  perform  a  particular  behavior  when  confronted 
simultaneously  by  many  stimuli  that  often  require  conflicting  responses,  as  might 
occur  in  the  natural  environment. For  example,  turning  an  animal  upside  down 
evokes  righting  behavior  having  a  definable  duration.  Presenting  food  to  the  ani¬ 
mal  produces  several  components  of  feeding  behavior  at  definable  thresholds.  When 
turning  the  animal  upside  down  and  presenting  food  simultaneously,  righting  times 
significantly  increase,  but  feeding  thresholds  remain  constant.  By  such  simultaneous 
presentations  of  different  stimuli  to  evoke  pairs  of  behaviors,  it  is  possible  to  define 
a  behavioral  hierarchy,'’^  and  to  view  the  process  of  establishing  the  hierarchy  as  a 
reflex  system  where  one  behavior  inhibits  another. 

It  is  necessary,  however,  to  go  one  step  further.  Early  studies  on  behavioral 
“choice”^'*'*  indicated  that  some  behaviors  seem  to  blend  into  one  another,  as  Kirsti 
Bellman^**  was  to  show  later  in  lizards.  In  Pleurobranchaea,  for  example,  the  anterior 
portion  of  the  foot  may  start  to  twist  in  order  to  right,  but,  at  the  same  time,  it  may 
begin  to  cup  around  the  descending  solution  of  the  food  stimulus.  The  anterior  foot 
appears  to  be  attempting  to  perform  two  contradictory  behaviors  at  the  same  time. 
Even  when  righting  behavior  starts,  it  is  slowed  because  the  foot’s  motor-system  is 
still  receiving  conflicting  activities,  one  for  righting  and  one  for  feeding.  We  do  not 
deny  that  reflexes  involving  inhibition  can  be  found,  but  doing  that  alone  places 
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one’s  concepts  on  the  side  of  the  razor’s  edge  in  which  behavior,  and  the  underlying 
neurointegrations,  are  viewed  as  set  and  repeatedly  definable  structures.  The  im¬ 
portant  issue  to  us  is  the  process  of  forming  the  behavioral  "choice"  during  the  time 
that  the  animal  is  presented  multiple  stimuli  rather  than  a  stereotyped  behavioral 
hierarchy.  The  two  approaches  speak  about  the  same  behaviors  but  give  different 
explanations.  The  contextual  approach  views  behavior  as  arising  fluidity  among 
many  different  and  blendable  behaviors.  The  reflex  approach  views  the  animal  as 
a  generator  of  a  set  of  fixed-action  patterns  (FAPs:  e.g..  Gillette'^’’),  each  relating 
to  definable  and  repeatedly  identifiable  responses  in  the  animal.  The  definition  of 
behavioral  hierarchy  forces  one  to  think  of  behaving  animals  as  concatenations  of 
reflexes  or  FAPs  that  are  repeatedly  definable.  In  the  extreme  situation  in  which  an 
inverted  animal  lies  motionless,  neither  feeding  nor  righting,  the  definition  of  behav¬ 
ioral  hierarchy  would  lead  one  to  develop  experiments  showing  inhibition  between 
feeding  and  righting  sensory-rnolor  systems,  as  shown  for  the  interaction  between 
feeding  and  withdrawal. Ft  would  also  lead  one  to  identify  a  particular  locus  in 
the  nervous  system  at  which  such  inhibition  takes  place.  The  variability  of  activity 
in  Pleurobranchaea.  and  the  high  degree  of  converging  and  diverging  connections 
in  its  nervous  system  lead  us  to  believe  that  such  localization  of  mechanism  may 
be  misleading.  By  contrast,  when  taking  these  factors  into  account,  one’s  focus  is 
directed  to  dynamically  shifting  contexts  of  activity  in  which  the  identity  and  loca¬ 
tion  of  the  underlying  mechanism  for  a  behavior  is  not  fixed,  just  as  the  behavior 
may  not  be  fixed  and  always  distinguishable  from  others.  One  is  more  apt  to  think 
of  variably  emerging  networks  rather  than  "switchboard”  reflexes. 

Thus,  although  the  definition  of  behavioral  hierarchy  is  useful  for  categoriza¬ 
tion,  and  although  it  is  defined  using  the  behavioral  choice  paradigm,  it  dangerously 
excludes  the  dynamics  within  choice-making  processes.  To  be  sure,  reflex  actions 
are  indications  of  a  process,  but  the  reflex  approach  leads  one  to  examine  the 
structure  of  the  network  itself  whereas  an  approach  that  deals  with  the  dynamics 
of  interactions  leads  one  to  examine  principles  of  interaction  from  which  networks 
emerge  not  only  variably  but  also  nonlinearly,  as  we  shall  try  to  illustrate  in  Section 
6,  when  dealing  with  reductionism,  and  in  Section  8  when  dealing  with  computer 
simulations.  Inhibitory  interactions  between  motor  systems  may  be  used  by  both 
explanations,  but  the  dynamical  approach  uses  inhibition  either  as  a  potential  e.x- 
planation  that  may  or  may  not  actually  take  place,  or  as  a  participating  variable 
in  a  system  that  expresses  the  dynamics.  In  either  of  these  non-reflex  explanations, 
the  role  of  inhibition  may  not  be  discernible  from  the  structure  of  the  network  it¬ 
self,  though  dynamical  explanations  must  also  account  for  conditions  that  actually 
express  reflexes. 


3.3  CONTEXT  OF  ACTION  IN  THE  BUCCAL-ORAL  SYSTEM 

The  buccal-oral  system  of  Pleurobranchaea,  consisting  of  the  lips,  mouth,  radula, 
and  jaws,  seems  to  magnify  variation  and  behavioral  blending  because,  as  noted 
above,  it  is  capable  of  generating  many  different  behaviors  and  variants  within 
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individual  behaviors.  Moreover,  blending  happens  among  the  various  mouth-related 
behaviors  themselves,  as  well  as  with  behaviors  produced  by  other  motor  systems. 
A  number  of  studies  have  provided  criteria  for  identifying  motor  patterns  relating  to 
particular  buccal-oral  behaviors.  McClellan^^'*'^^®’^^^’^^^  and  Croll  and  Davis^*^^® 
have  established  specific  motor-pattern  differences  in  electrical  recordings  made 
from  muscles  and  nerves  to  distinguish  between  feeding,  regurgitation,  and  rejection 
behaviors,  but  even  McClellan’s  studies  demonstrated  that  different  behaviors  can 
be  generated  by  similar  motor  patterns. 

Having  observed  considerable  motor-pattern  variations.  Mpitsos  cind 
Cohan^^'^’^^®'^'*^  devised  a  series  of  associative  learning  experiments  to  determine 
whether  a  learned  response  persisted  in  even  minimally  dissected  animals.  The  re¬ 
sults  clearly  showed  that  the  behaviors  of  the  undissected  and  dissected,  behaving 
animals  were  identical,  as  determined  by  direct  observation  of  what  the  animal 
did  in  response  to  the  applied  experimental  and  control  stimuli  that  were  used 
in  training.  However,  when  examining  the  electromyographic  data  alone,  obtained 
simultaneously  while  observing  the  behaviors,  it  was  not  possible  to  identify  con¬ 
sistent  differences  in  the  firing  patterns  of  muscles  during  feeding,  regurgitation, 
and  rejection.  The  information  had  to  reside  within  these  patterns,  but  the  infor¬ 
mation  itself  could  not  be  read  simply  by  examining  the  temporal  orchestration 
of  activity  in  the  recorded* motor  patterns.  An  alternative  explanation  is  that  the 
information  resides  in  the  dynamics  of  the  neuromuscular  system  as  a  whole,  i.e..  in 
the  combination  of  interactions  between  the  motor  output,  in  the  nonlinear  loading 
presented  by  the  muscles  and  mouth  and  jaw  structures,  and  in  the  effect  of  sensory 
feedback  to  the  central  nervous  systems.  Such  systems  may  have  qualities  similar  to 
damped-driven  oscillators  whose  dynamics  are  sensitive  to  changes  in  parameter- 
constants  that  control  the  effects  of  different  variables  (e.g.,  see  the  description  of 
the  Duffing  oscillator  in  Thompson^**^).  Not  inconsistent  with  this  is  that  the  animal 
can  perform  a  given  behavioral  effect  successfully  using  combination  of  patterns.  In 
neural  activity,  it  may  be  sufficient  to  have  reeiched  an  approximating  and  variable 
“consensus”  or  “context”  of  action  rather  than  requiring  an  explicit  stereotyped 
pattern. 

The  neural  sources  of  some  of  this  variation  were  identified  in  studies  of  iso¬ 
lated  nervous  systems  that  were  used  in  order  to  remove  the  influence  of  sensory 
perturbations.  For  example,  neural  patterns  reemerge  even  when  BCNs  that  were 
initially  responsible  for  generating  patterned  activity  are  reversibly  removed  from 
the  coactive  networks  (Figure  5  in  Mpitsos^^^),  showing  that  different  combinations 
of  neurons  generate  similar  responses.  Similarly,  the  firing  of  some  BCNs  shift  vari¬ 
ably  between  completely  opposite  phases  of  the  cycle  of  opening  and  closing  of  the 
jaws  (Figure  16  in  Mpitsos^^®).  Graded  intermediates  may  occur  as  the  nervous  sys¬ 
tem  generates  patterns  of  rhythmic  activity  and  spontaneously  shifts  into  another 
pattern. 

Our  view  is  that  the  intermediate  and  ^«u:iable  forms  of  activity  give  crucial 
information  about  integrative  mechanisms.  Variations  that  occur  within  group  ac¬ 
tion  must  arise  from  variations  at  the  level  of  individual  neurons.  To  present  these 
ideas,  the  next  two  subsections  discuss  “attractors”  and  “attracting  states,”  and 
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the  role  that  different  forms  of  variation  and  error  have  in  the  response  properties 
of  biological  systems. 


3.4  DEFINITIONS:  MODES  OF  COOPERATIVITY 

3.4.1  ATTRACTORS  AS  DISSIPATIVE  STRUCTURES  An  intuitive  definition  of  at¬ 
tractor  may  be  given  by  examining  the  property  of  attraction.  Suppose  for  the 
moment  that  we  are  dealing  with  a  process  governed  by  three  variables.  The  state 
of  the  system  at  any  given  time  is  represented  by  the  values  of  the  these  variables. 
The  progression  of  these  values  over  time  define  the  parameter  state  space  of  the 
activity  of  the  system.  Plots  of  these  variables,  one  variable  in  each  coordinate  of 
three-dimensional  space,  defines  the  phase  space.  The  flow  or  trajectory  from  one 
point  to  another  provides  a  view  of  the  phase  portrait  of  the  dynamics  of  the  activ¬ 
ity.  For  continuous  periodic  activity,  the  trajectory  is  a  closed  loop.  A  brief  external 
perturbation,  applied  to  one  or  any  combination  of  the  variables,  will  move  the  state 
of  the  system  away  from  the  closed  loop.  If  the  trajectory  then  collapses  asymptot¬ 
ically  back  toward  the  closed  loop,  the  system  may  be  considered  to  be  governed 
by  an  attractor.  The  set  of  all  possible  perturbations,  and  subsequent  dissipative 
responses  shown  by  the  asymptotic  recovery,  define  the  inset  to  the  attractor  or 
its  basin  of  attraction.  In  th'e  case  of  periodic  activity  the  attractor  is  a  limit  cycle. 
The  activity  could  also  be  generated  by  chaotic  attractors  whose  trajectories  are  not 
represented  by  a  limit  set  either  before  or  after  perturbations,  but  by  an  attracting 
set.  An  indication  of  this  set  may  be  viewed  through  the  geometry  of  the  topolog¬ 
ical  manifold  in  which  the  trajectories  mix.  Examples  of  the  mixing  geometry  of 
attractors  in  Pleurobranchaea  responses  and  model  systems  in  our  own  work  may 
be  found  in  Mpitsos^^^  *'*'*  and  Andrade  et  al.,®  respectively.  Though  we  have  used 
phase  portraits  to  obtain  an  intuitive  view  of  attractors,  a  single  dynamical  system 
may  have  ph2ise  portraits  containing  multiple,  competing  attractors. 

The  above-cited  work  from  our  laboratory  also  discusses  a  variety  of  geomet¬ 
rical  and  computational  tools  that  may  be  used  to  determine  whether  the  activity 
is  generated  by  limit-cycle  or  chaotic  attractors.  In  either  case,  the  most  useful  for 
determining  whether  the  system  is  generated  by  an  attractor  is  to  conduct  the  per¬ 
turbation  experiments  described  above,  which  a  major  focus  of  our  present  efforts 
in  both  biological  and  model  systems.  Much  experimental  work  needs  to  be  done  in 
this  way,  but  it  is  quite  likely  that  attractors  underlie  much  biological  function,  as 
shown,  for  example,  by  perturbation  experiments  designed  to  test  for  resetting  of 
the  phase  of  oscillatory  activity  (an  example  of  an  externally  applied  current  pulses 
to  one  of  the  BCNs  in  Pleurobranchaea  is  shown  in  Figure  3  in  Mpitsos*^^). 
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3.4.2  LOW  DIMENSIONALITY  IN  HIGH-DIMENSIONAL  SYSTEMS  As  the  system 
evolves  to  dissipate  perturbations,  one  would  observe  that  the  ensemble  of  points 
in  state  space  decreases  over  time.  i.e..  that  there  is  volume  contraction.  X’olume 
contraction  simplifies  the  topology  of  the  structure  defined  by  the  trajectories,  and 
as  pointed  out  by  Fhomp-son  and  Stewart.*'"'^  ''This  can  often  mean  that  a  complex 
dynamical  system  with  even  infinite-dime nsional  phase  space. .  .can  settle  to  final 
behavior  in  a  subspace  of  only  a  few  dimensions'  (p.  1). 

This  phenomenon  is  particularly  important  in  biological  systents  because  they 
are  inherently  high  dimensional.  A  single  cell  in  the  visual  cortex  of  the  mouse, 
for  example,  receives  inputs  from  approximately  5000  other  cells."'  each  of  which 
may  be  a  controlling  variable.  Numerical  analy.ses  of  spontaneous  cortical  neuron 
activity, of  EEGs  in  olfactory  bulb.''"  cortex,''^''"''  and  of  motor  patterns  in 
Pleurohranchaea}^'  all  indicate  that  the  activity  is  generated  by  relatively  few 
variables.  One  of  the  tasks  Lacing  work  in  animals  such  ;is  Pleurobranchaea.  and 
of  correlative  computer  simulations,  is  to  identify  the  variables,  out  of  the  many 
available,  that  become  active  in  low-dimensional  activity,  and  to  identify  the  con¬ 
ditions  among  these  variables  that  permit  low  dimensionality  to  arise.  Part  of  the 
goal  of  our  computer  simulation  is  to  define  minimal  structures  that  permit  the  gen¬ 
eration  of  different  types  of  attractors,  and  to  determine  how  different  attractors 
might  arise  at  different  times  within  the  same  high-dimensional  space.  An  interest¬ 
ing  possibility  is  that  what  determines  which  sub-space  is  occupied  may  simply  be 
a  matter  of  what  attractor  becomes  established  first.  In  a  sense,  there  may  be  a 
type  of  competition  such  that  the  same  behavior  at  some  different  times  may  be 
generated  by  a  somewhat  different  attractors  arising  trom  variable  subsets  of  the 
available  high-dimensional  possibilities. 

3.4.3  TURBULENCE,  "ATTRACTING  STATES,  AND  SELF-ORGANIZING  CRITICALITY” 

Given  weak  connections,  which  are  common  in  the  Pleurobranchaea  nervous 
system, it  is  not  inconceivable  that  different  limit-cycle  and  chaotic  attractors 
may  emerge  simultaneously  within  the  same  network,  moving  and  blending  in  space 
and  time,  and  giving  rise  to  the  blending  seen  in  whole-animal  behavior'*^  and  in 
some  motor  patterns. These  conditions  may  provide  the  opportunity  for  analogs 
of  turbulence  to  occur. As  discussed  in  the  computer  studies  described  in  Section 
8,  we  believe  that  large  groups  of  neurons  need  not  all  act  in  a  coordinated  fash¬ 
ion,  particularly  when  a  large  number  of  relatively  weak  synapses  are  distributed 
throughout  the  network.  The  statistical  properties  of  the  network  and  the  effect  of 
weak  coupling  may  permit  conditions  under  which  different  subsets  of  the  extended 
network  are  able  to  begin  acting  cooperatively  within  themselves.  Yet  owing  to 
extensive  convergence  and  divergence  of  the  underlying  connectivity,  one  subset  of 
neurons  may  influence  the  coordinated  firing  of  other  subsets.  In  this  way,  small 
foci  of  coordinated  firing  may  move  spatially,  blend,  or  separate  in  to  different  foci, 
much  as  one  might  envision  of  vortices  in  hydrodynamic  turbulence.  Instructive 
examples  of  such  phenomena  in  physical  models  have  been  presented  in  laboratory 
simulation and  computer  simulations  of  the  formation  of  the  large  red  spot  of 
Jupiter."^  Videotapes  showing  the  evolution  of  vortices  in  the  hydrodynamic  model 
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and  in  the  computer  simulations  were  seminal  in  solidifying  our  own  intuition  about 
what  may  happen  in  neural  systems. In  considering  the  possibility  of  turbulence 
in  neural  systems,  our  own  feeling  is  that  the  definition  of  "attractor '  in  such  cases 
may  not  be  as  suitable  as  in  more  definable  spatio-temporal  structures.  We  prefer 
to  use  the  term  "attracting  states." 

.\ttracting  states  may  have  some  resemblance  to  mechanisms  of  self-organizing 
criticality  (SOC)  proposed  by  Bale  and  coworkers.'®  The  ideas  have 

been  applied  to  models  of  turbulence  in  forest  fires''*  and  the  production  of  un¬ 
predictable  avalanches  that  occur  when  attempting  to  build  mounds  of  sand  by 
piling  one  grain  of  sand  over  another.'"  Local  effects  are  deterministic  and  easily 
observed,  but  the  global  effects  are  not  predictable  from  such  local  information, 
and  partly  for  these  reasons,  systems  governed  by  SOC  seem  to  be  acting  near 
the  "boarder  of  chaos. "'®  To  our  knowledge,  .SO(^  has  not  been  applied  to  nervous 
systems.  We  envision  that  conditions  that  would  allow  SOC  to  take  place  would 
retain  the  deterministic  character  of  monosynaptic  actions  between  neurons,  but 
given  weak  interactions,  would  also  permit  statistical  or  random  spatio-temporal 
long-range  effects  through  polysynaptic  action. 


3.5  CHAOS  AND  OTHER-FORMS  OF  VARIATION 

3.5.1  BIFURCATION  PARAMETERS  AND  CHAOS  We  shall  examine  bifurcation  pa¬ 
rameters  in  more  detail  in  a  Section  8.  It  is  sufficient  to  state  briefly  that  they  are 
parameter  constants  that  control  how  a  .system  (or  its  defining  set  of  equations) 
expresses  its  nonlinear  characteristics.  When  the  system  is  far  from  critical  points, 
changes  in  bifurcation  constants  have  relatively  little  effect  on  the  dynamics  of  the 
system.  At  or  near  critical  points,  small  changes  in  bifurcation  parameters  produce 
rapid  changes  (bifurcations)  in  the  response  of  the  system.  Within  certain  ranges  in 
the  values  of  these  parameters,  the  system  may  exhibit  rapid  shifts  between  different 
types  of  periodic  activity  and  chaos  as  the  parameter  is  successively  changed." 

The  simplest  definition  of  chaos  is  that  it  is  completely  deterministic  at  each 
step  of  its  temporal  evolution,  yet  over  the  long  term,  its  response  is  not  predictable. 
An  example  we  shall  discuss  later  is  the  logistic  equation,  given  by  =  /i(  1  — 

A„)A'„  where  li  is  the  bifurcation  constant.  This  equation  has  no  random  factor  in 
it,  yet,  for  certain  values  of  R,  it  is  not  possible  to  predict  the  evolution  of  the  time 
series  several  iterations  into  the  future  given  some  initial  starting  value.  Despite  its 
long-term  equivalence  to  random  noise,  the  organized  geometry  in  plots  of  A'„  versus 
A'„+i  clearly  show  the  deterministic,  non-random  character  of  chaos.’"" 

It  is  difficult  to  prove  that  biological  systems  generate  chaotic  attractors,  owing 
primarily  to  their  short-lived  and  apparently  nonstationary  behavior.'""  However, 
computer  simulations  have  clearly  shown  that  Hodgkin- Huxley  membranes"®  and 
the  parabolic  burster  neuron,  Ris,  in  the  abdominal  ganglion  of  .Aplysia.-'^  may¬ 
be  capable  of  bifurcating  into  a  broad  spectrum  of  simple  periodic  and  chaotic 
activity.  Our  pre'  ious  studies  on  the  implications  of  attractors  and  variation,  and 
of  their  implication  in  the  generation  of  contexts  of  interrelated  firing  in  groups  of 
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neurons,  have  been  discussed  in  behavioral  and  neurophysiological  studies. 

And  there  is  some  evidence  for  chaos  in  the  responses  of  individual  BCNs  and 
motor  neurons  in  Pleurobranchaea}^^ Other  activity  of  single  neurons  is  more 
consistent  with  noisy  limit  cycles. 

The  lessons  to  be  gained  from  chaos  are:  (1)  as  illustrated  by  the  logistic  equa¬ 
tion,  vau'iations  arising  from  chaos  are  not  “noise”  superimposed  on  the  information- 
carrying  signal;  they  themselves  represent  the  information.  (2)  The  information  in 
chaotic  systems  is  always  increasing  with  respect  to  information  available  at  a  given 
initial  time.  This  is  to  say  that  if  chaos  is  to  represent  behavior,  it  is  necessary  to 
use  the  long-term  phase-space  geometry  of  the  attractor  driving  the  system  to  gain 
a  view  of  what  the  behavior  is  like.  Given  equal  noise-free  conditions,  the  behav¬ 
ior  represented  by  periodic  activity  can  be  defined  in  a  single  orbit.  (3)  Periodic 
or  limit-cycle  activity  dissipates  perturbations  differently  than  chaotic  systems.  As 
pointed  out  by  Conrad limit  cycles  in  biological  motor  systems  dissipate  pertur¬ 
bations  in  ways  equivalent  to  heat  loss  through  the  body  structures  innervated  by 
the  neural  system  in  question,  whereas  chaotic  attractors  dissipate  the  perturba¬ 
tions  by  generating  new  variations.  Limit-cycle  attractors  always  return  to  doing 
behaviors  in  the  same  stereotyped  ways.  Chaotic  attractors  generate  new  variations 
naturally  in  response  to  perturbations  because  their  sensitivity  to  initial  conditions 
always  forces  them  to  generate  the  behaviors  in  different  ways,  which  is  to  say 
that  behaviors  are  edways  different  in  chaotic  systems.  (4)  Mpitsos  and  Burton^^® 
have  shown  that  chaotic  discrete  processes,  much  as  might  occur  in  spike  trains 
communicating  between  networks,  allow  simple  networks  to  perform  complicated 
tasks  that  would  require  considerably  more  complex  networks  to  perform  if  the 
signals  were  generated  by  nonchaotic  discrete  processes  or  by  continuous  periodic 
or  continuous  chaotic  processes.  (5)  It  was  also  shown  that  the  inherent  variations 
of  chaotic  discrete  processes  permits  networks  that  receive  such  signals  to  opti¬ 
mize  their  responses  either  in  transmitting  the  signal  one-for-one  or  in  performing 
computations  on  them.  That  is,  the  deterministic  character  of  chaotic  discrete  pro¬ 
cesses  allows  them  to  convey  information,  yet  their  long-term  randomness  provides 
sufficient  variation  to  allow  the  responding  network  to  learn  rapidly.  As  we  shall 
discuss  below,  random  noise  may  be  used  advantageously  to  perform  such  opti¬ 
mizations.  But  random  noise  has  the  disadvantage  of  being  high  dimensional,  Jind 
high-dimensionad  processes  are  difficult  to  generate  because  they  must  represent 
many  degrees  of  freedom.  Chaotic  processes  are  long-term  equivadent  to  random 
noise,  yet  the  expression  of  chaos  can  be  easily  controlled  using  low-dimensional 
systems  and  simple  adjustments  to  a  single  control  parameter,  as  in  the  logistic 
equation.  In  multibehavioral  systems  such  as  Pleurobranchaea,  the  combined  infor¬ 
mational  content  and  variation  of  chaos  may  be  useful  in  accessing  the  different 
response  possibilities.^^® 
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3.5.2  BIFURCATION-INDUCED  VARIATIONS  Another  form  of  low-dimensional  varia¬ 
tion  arises  when  systems  approach  bifurcation  points.  An  intuitive  understanding 
for  this  may  be  given  by  recalling  the  above  discussion  on  the  demonstration  of 
attractors  lying  in  three-dimensional  space,  and  using  this  example  to  understand 
what  happens  to  Lyapunov  e.xponents  as  the  system  approaches  bifurcation  points. 
In  a  system  governed  by  three  variables,  there  are  three  exponents.  (A  useful  dis¬ 
cussion  of  Lyapunov  exponents  and  numerical  methods  for  estimating  them  are 
presented  in  VVolf^^").  .A  negative  Lyapunov  exponent  indicates  that  there  is  con¬ 
traction  in  a  given  direction  in  phase  space.  If  all  three  exponents  were  negative, 
the  flow  of  points  in  phase  space  would  collapse  in  all  directions  into  a  single  point. 
For  continuous,  bounded  systems  not  at  a  fixed  point,  at  which  the  system  remains 
at  equilibrium  at  some  non-changing  parameter  state  (see  definition  in  Thompson 
and  Stewart, p.  194),  Haken'^  has  shown  that  one  of  the  exponents  must  be  zero. 
In  a  simple  limit  cycle  governed  by  three  variables,  the  remaining  exponents  must 
be  negative.  The  negativity  in  the  sum  of  the  exponents  assures  that  there  is  an 
overall  contraction  in  the  flow  of  points  in  phase  space  to  keep  the  system  bounded. 
The  summed  negativity  also  assures  that  the  system  will  dissipate  perturbations  if 
they  are  not  so  large  as  to  push  the  state  beyond  the  attractor's  basin  of  attrac¬ 
tion.  Bifurcations  into  chaos  introduce  a  positive  exponent,  but  retain  the  criteria 
of  one  zero-valued  exponent  and  that  the  sum  of  the  exponents  be  negative.  The 
positive  exponent  shows  that  the  state  of  the  system  in  the  corresponding  dimen¬ 
sion  of  phase  space  is  always  expanding.  Having  a  zero-valued  Lyapunov  exponent 
indicates  that  the  growth  in  phase  space  is  neither  contracting  nor  expanding  over 
time.  Thus,  the  rate  of  growth  of  a  three-variablel^l  system  in  phase  space  is  given 
by  where  Ai,A2,  and  A3  are  the  corresponding  Lyapunov  exponents  for 

growth  in  each  direction  of  phase  space,  and  t  is  time.  Since  the  exponential  change 
is  given  as  base  2,  the  exponents  express  the  rate  of  change  of  growth  in  phase 
space  as  information  in  bits  per  second.  Thus  limit  cycles  lose  information  as  they 
evolve  with  respect  to  some  initial  state,whereas  chaotic  systems  gain  information. 


1*1  The  need  for  three  variables  in  continuous  systems  that  can  generate  chaos  may  be  viewed 
intuitively  by  examining  the  flow  of  trajectories  in  phase  si>ace  and  their  ability  to  mix  as  they 
course  through  the  attractor  surface;  a  typical  trajectory  will  visit  every  vacinity.  Evidence  for 
mixing  can  be  obtained  by  cutting  a  Poincare  section  through  the  phase  portrait  and  noting  the 
interrelated  positions  of  the  crossings  of  the  trajectory  through  the  section. If  one  places  a 
string  on  a  flat  surface  defined  by  two  variables,  it  is  possible  to  conform  the  shape  of  the  string 
to  flow  to  a  fixed  point,  to  form  a  variety  of  self-similar  spirals,^®®,  or  to  connect  the  two  ends  of 
the  string  to  form  limit  cycles  (also  see  a  discussion  of  the  Jordan  curve  theorem  and  the  theorem 
of  Poincare- Bendixon  in  Hofbauer  and  Sigmund^®).  However,  it  is  not  possible  to  have  nearby 
lengths  of  the  string  diverge  from  one  another  and  eventually  mix  in  their  interrelated  positions 
without  causing  the  string  to  cross  on  itself  somewhere  unless  the  trajectories  flow  into  a  third 
dimension  and  then  fold  back  onto  a  thickened  plane;  i.e.,  however  imperceptible,  there  must  be 
a  thickness  to  the  surface  of  the  attractor  composed  of  countless  layers  arising  from  continuous 
stretching  and  folding  which  brings  distsint  trajectories  close  together.  Discrete  processes,  on  the 
other  hand,  can  generate  chaos  in  a  single  dimension,  as  shown  by  the  logistic  equation. 
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As  a  system  approaches  bifurcation  points,  some  of  the  Lyapunov  exponents 
approach  zero  values,  as  we  show  herein  for  the  catadytic  network  model  of  An¬ 
drade  et  al.®'^'*®  Setting  the  bifurcation  parameter,  p,  to  a  value  of  .02,  generates  a 
one-period  limit  cycle  far  from  a  bifurcation  point,  and  Ai,A2,  and  A2,  have  values, 
respectively,  of  0,  -2.8,  and  -43.  Adjusting  /x  to  .0125,  well  past  the  bifurcation  into 
a  two-period  limit  cycle,  the  exponents  have  values  of  0.  -3.6,  and  -43.  However, 
setting  /i  to  .0149,  which  is  near  the  bifurcation  point,  the  exponents  are  0,  -.05. 
and  -46;  A2  vanishes.  Thus,  as  the  system  approaches  bifurcation  points,  a  greater 
number  of  Lyapunov  exponents  approach  zero  than  when  the  system  is  farther 
away  from  these  points.  Perturbation.^  in  directions  of  phase  space  governed  by 
exponents  having  small  negative  values  would  be  dissipated  slowly.  Even  in  model 
systems  having  no  extraneous  injected  noise,  transient  variations  are  often  difficult 
to  remove  when  attempting  to  locate  bifurcation  points. 

Kelso,  Schlaltz,  and  Schoner*^^  have  given  the  term  “critical  fluctuations”  to 
the  variations  observed  in  human  finger  movements  during  phase  transitions,  or. 
in  our  terminology,  at  critical  bifurcation  conditions.  We  have  observed  similar 
fluctuations  in  our  own  studies  using  sinusoidal  current  to  drive  individuail  neurons 
in  PleuTobranchaea  and  Aplysta.^^  Moreover,  since  the  Pleurobranchaea  buccal-oral 
system  appears  to  sit  metastably  netir  transitions  into  different  patterns  of  activity 
(as  shown,  for  example,  _by  frequent  spontaneous  transitions  of  activity  in  isolated 
nervous  systems;  e.g.,  see  Mpitsos^'^'*),  we  should  expect  to  see  variations  in  activity 
simply  because  of  the  tendency  of  the  system  to  pass  through  bifurcation  conditions. 
In  model  networks,  it  is  possible  to  generate  activity  in  the  system  long  enough  to 
get  rid  of  transients.  But  biological  systems,  which  generally  do  not  have  such  long¬ 
term  luxury,  should  exhibit  considerable  variation  simply  because  of  bifurcation 
effects,  unless  they  lie  far  from  critical  points. 

A  rather  interesting  problem  of  bifurcation-induced  variations  occurs  in  regions 
of  the  controlling  parameter  that  cause  chaos.  Such  regions  are  filled  with  sub- 
regions  that  lead  to  periodic  activity,  as  can  easily  be  demonstrated  by  examining 
the  bifurcation  parameter  of  the  logistic  equation  at  expanded  scales. Therefore, 
small  changes  in  a  control  parameter  may  actually  lead  to  rapid  shifts  between  chaos 
and  periodicity,  with  each  state  being  accompanied  by  transient  variations.  Clearly, 
there  is  a  need  to  understand  how  biological  systems  cope  with  the  sensitivity  in 
the  adjustment  of  bifurcation  parameters  and  with  the  different  forms  of  variations 
that  arise  from  such  adjustments.  One  possibility  may  be  that  the  large  number  of 
converging  and  diverging  connections  among  neurons  may  buffer  unwanted  bifur¬ 
cation  conditions  by  lifting  the  controlling  effect  from  residing  in  single  neuron  or 
a  few  of  them  and  distributing  it  over  a  large  number  of  neurons.  In  this  way,  the 
bifurcation  conditions  emerge  from  group  action,  though  individual  neurons  may 
exhibit  neatf  critical  behavior.  This  may  also  be  a  reason  for  the  observation  of  the 
wide  distribution  and  convergence  of  neurotransmitters  and  modulators. 
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3.5.3  RANDOM  NOISE  Other  variability  in  Pleurohranchaea  seems  to  be  high¬ 
dimensional,  or  even  random,  as  shown  by  the  response  of  a  single  neuron  in  Fig¬ 
ure  1  of  Mpitsos^^®  and  by  the  analysis  of  electromyograms  in  Mpitsos.'^  It  has 
long  been  known  that  a  little  random  noise  may  help  systems  to  avoid  local  min¬ 
ima  which  may  be  defined  for  the  present  purposes  as  non-optimai  responses  (see 
Figure  8  in  Burton  and  Mpitsos"^  for  a  diagrammatic  demonstration  of  local  min¬ 
ima).  The  physicochemical  properties  of  DNA  provide  an  e.xample  of  one  use  of 
noise  in  biological  studies.'^®  Heating  solutions  of  DNA  (injecting  noise  into  the 
system)  breaks  the  two  complementary  strands  apart.  If  the  solution  is  cooled  too 
rapidly,  the  original  complementary  bonds  between  base  pairs  is  not  completely- 
restored;  i.e.,  the  system  has  fallen  into  a  local  minimum.  If  the  solution  is  cooled 
slowly,  the  strands  recombine  optimally,  forming  the  absolute  minimum.  Thus,  the 
terms  “local  minima”  and  “absolute  minimum”  may  be  used  to  refer  to  number  of 
characteristics,  such  as  information  storage,  reconstruction  of  an  original  template, 
and  energy  level.  Such  processes  of  noise  control  are  time  dependent,  and  usually 
control  noise  by  decreasing  it  exponentially.  The  method  is  referred  to  as  simulated 
annealing.  Kirkpatrick,  Gelatt,  and  Becchi®^  discuss  simulated  annealing  and  apply 
it  to  several  optimization  problems,  including  the  placement  of  computer  chips  on 
a  circuit  board,  in  which  the  goal  is  to  minimize  wire  length  and  bends,  and  the 
traveling  salesman  probleni,  in  which  the  goal  is  to  minimize  the  distance  traveled 
between  cities  if  each  city  is  visited  only  once.  Simulated  annealing  is  time  depen¬ 
dent  because  it  requires  the  noise  in  the  system  to  have  a  decay  rate,  and  once  the 
noise  has  died  out,  it  is  necessairy  to  introduce  noise  into  the  system  again  in  order 
for  it  to  be  ready  to  respond  to  a  new  situation.  Biological  systems  are  generally 
event  dependent,  not  time  dependent.  It  may  be  difficult  or  impossible  to  determine 
in  advance  when  the  next  challenge  to  survival  will  occur  or  what  it  will  be,  and 
when  to  re-inject  noise  into  the  system.  Once  a  challenge  has  presented  itself,  there 
may  not  be  enough  time  to  adjust  the  rate  of  decay  of  noise. 

As  a  step  in  determining  how  random  noise  might  be  used  in  adaptive  sys¬ 
tems,  Burton  and  Mpitsos^^  devised  time-independent  noise  algorithms  (TINA) 
that  control  noise  through  the  response  of  the  system,  as  would  occur  in  natu¬ 
ral  environments,  rather  than  through  predefined  time  schedules.  To  demonstrate 
the  algorithm.  Burton  and  Mpitsos  used  simple  nonbiological  neural  networks  that 
were  required  to  learn  to  transmit  or  manipulate  chaotic  input  signals,  much  as 
might  occur  if  networks  communicated  with  one  another  with  chaotic  spike  trains. 
Networks  were  trained  using  an  error-backpropagation  algorithm.^®®  Random  noise 
was  added  to  the  learning-induced  changes  in  synaptic  weights  and  thresholds,  but 
the  level  of  the  injected  noise  was  adjusted  on  the  basis  of  the  amount  of  error  gen¬ 
erated  each  time  the  network  responded  to  an  input  event.  By  such  adjustments 
it  was  possible  to  avoid  local  minima  and  speed  the  process  of  reaching  maximal 
levels  of  learning. 
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3.5.4  VARIATION-DEPENDENT  OPTIMIZATION  IN  MULTIFUNCTIONAL  SYSTEMS 

Thus,  random  noise,  chaos,  and  possibly  variations  arising  from  bifurcation  con¬ 
ditions  may  provide  conditions  leading  to  two  different  methods  of  optimizations. 
The  effect  of  chaotic  discrete  processes  was  shown  under  conditions  in  which  chaos 
would  act  as  a  transmitter  of  information  between  networks,  whereas  the  effect  of 
noise  was  shown  when  it  was  added  to  changes  in  synaptic  weights  and  thresholds 
during  learning  when  the  network  had  to  respond  to  the  chaotic  signal.  However, 
chaos  is  only  short-term  deterministic.  The  long-term  statistics  of  chaotic  discrete 
processes,  as  might  occur  in  spike  treiins,  are  identical  to  random  noise.  For  systems 
such  as  Pleurohranchaea  or  the  mammalian  olfactory  bulb^^^  that  are  multifunc¬ 
tional  or  contain  multiple  information  within  the  same  set  of  connections,  variations 
that  eillow  the  system  to  search  for  one  of  many  attractors  or  attracting  states  may 
be  essential. 

The  three  types  of  variation  mentioned  above  involve  different  search  strategies 
and  control  methods.  Chaos  has  a  deterministic  search  strategy  and  can  be  con¬ 
trolled  through  bifurcation  parameters  in  membrane  dynamics, synaptic  release 
(see  the  interesting  suggestion  in  Kriebel  et  al.^°^)  and,  as  we  shzdl  discuss  in  Section 
8,  in  synaptic  strengths.  Neural  systems  may  be  able  to  approximate  randomness 
simply  by  using  weak  synapses  and  by  taking  advantage  of  the  large  number  of 
connections  between  cells^.For  example,  connections  between  10-100  neurons  may 
provide  sufficient  degrees  of  freedom  to  approximate  the  high  dimensionality  of 
Gaussian  noise.  A  number  of  activity-dependent  changes  in  synaptic  strengths  or 
in  the  probability  of  transmitter  release^^  might  provide  methods  to  control  noise 
naturally  and  in  time-independent  ways.  Some  of  the  “noise”  or  variations  that 
occur  near  bifurcation  points  are  deterministic  and  self-controlled  because  they  are 
transients  that  die  out  asymptotically  as  the  activity  evolves  over  time.  Decreases  in 
the  value  of  Lyapunov  exponents  near  bifurcation  points  would  also  allow  random 
effects  to  become  amplified,  but  as  the  system  passes  through  bifurcation,  both  the 
transient  effects  and  random  variations  diminish. 

Variation,  not  chaos.  The  point,  then,  in  thinking  about  adaptive  mechanisms 
is  to  understand  the  use  of  a  spectrum  of  variational  types.  Owing  to  its  interesting 
phase-space  geometry  and  its  long-term  unpredictability,  chaos  has  received  much 
press.  The  important  issue,  however,  is  not  chaos,  but  variation  and  its  control, 
and  the  way  variation  affects  the  ability  of  the  system  to  access  different  dynamical 
states.  The  neural  architectures  that  support  the  generation  of  these  variabilities 
and  ones  that  lead  to  control  are  unexplored.  We  provide  suggestions  in  Section  8. 


3.6  ERROR  AS  AN  INTEGRATIVE  PRINCIPLE 

A  system  that  has  evolved  to  meet  only  one  adaptive  need  can  be  highly  tuned  to 
perform  that  task  well,  but  when  confronted  with  new  adaptive  needs,  such  systems 
may  prove  extremely  fragile.  Alternatively,  if  the  system  is  naturally  variable  the 
output  may  never  be  exactly  “right”  for  a  given  task,  but  it  may  be  right  enough  for 
the  rystem  to  adapt  successfully  to  different  situations.  Moreover,  given  a  limited 
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number  of  neurons,  a  greater  range  of  outputs  may  be  possible  when  the  system 
hais  variable  and  blendable  outputs  than  w’hen  the  system  contains  a  rigidly  fixed 
number  of  output  patterns. 

Error  may  not  only  be  a  product  of  system  dynamics,  it  may  also  be  influential 
in  the  establishing  the  dynamics.  The  first  indication  of  this  was  in  studies  of  hy¬ 
percycle  catalytic  networks  originally  devised  to  account  for  the  first  steps  in  ''hem- 
ical  or  prebiotic  evolution."’'^  ''’''  Schnabl.  Stadler,  Frost,  and  Schuster^*^^  recently 
showed  that  error,  expressed  a-s  mutual  intermutation  between  reactive  molecular 
species  significantly  affects  the  ability  of  a  system  to  bifurcate  into  complex,  chaotic 
oscillations.  Andrade  et  al.’’  provide  a  more  biologically  plausible  model  of  error  uti¬ 
lization  in  catalytic  networks  that  may  be  modifiable  for  application  to  studies  of 
neural  networks.  In  this  model,  error  arises  from  faulty  replication;  i.e.,  in  mutual 
intermutation  the  error  is  transformed  into  information  contained  in  another  re¬ 
actant  species,  whereas  in  faulty  replication,  information  is  simply  removed  from 
the  system.  Although  the  generation  of  complex  (chaotic)  behavior  in  this  latter 
model  is  less  sensitive  to  changes  in  error  than  the  mutual  intermutation  model, 
analysis  of  both  models  using  the  level  of  error  as  the  bifurcation  parameter  shows 
that  error  plays  a  role  in  the  dynamics  occurring  among  the  catalytic  interaction. 


3.7  DEFINITIONS:  DYNAMICS,  BEHAVIOR,  AND  MULTIFUNCTIONALITY 

The  above  discussions  provide  the  background  for  us  to  present  several  working 
definitions.  In  the  most  general  terms,  we  take  the  term  "dynamics”  to  imply  the 
generation  of  cooperative  activity  among  a  group  of  interacting  components  of  a 
system.  There  may  be  many  different  dynamical  mechanisms:  linear  shifts  in  the 
aggregates  of  coactive  components,  bifurcations,  limit-cycle  and  chaotic  attractors, 
attracting  states,  turbulence,  and  self-organizing  criticalities  are  just  a  few  exam¬ 
ples  that  we  mentioned.  As  we  shall  attempt  to  illustrate  further  in  Section  8, 
our  definition  of  “neurocircuits”  relies  heavily  on  dynamics  rather  than  network 
architecture. 

In  much  of  the  preceding  discussion,  we  have  used  the  term  “behavior”  in 
the  sense  that  the  behaviors  are  distinctly  different,  as  if  feeding,  regurgitation, 
righting,  and  other  behaviors  in  the  animal’s  repertoire,  were  definable.  Indeed, 
the  notion  of  a  repertoire,  seems  to  indicate  that  they  are  definable.  However, 
our  above  discussion  of  “contexts”  and  “consensuses”  shows  that  we  do  not  be¬ 
lieve  that  behaviors  need  be  repeatedly  the  same.  For  example,  the  animal  ingests 
food,  it  may  regurgitate  it,  and  it  may  right  when  inverted.  Yet  the  animal  may 
perform  these  behavioral  effects  in  many  different  ways.  If  we  are  correct  in  our 
assessment  of  variations  in  neural  activity  and  contexts,  it  is  possible  that  the  kine¬ 
matics  of  the  behavioral  effect  are  always  changing.  Given  this  blurring  of  what  the 
term  “behavior”  may  mean,  it  is  obvious  that  systems  capable  of  generating  many 
different  behaviors  using  the  same  neurons  must  be  defined  in  ways  that  include 
vairiation.  Therefore,  multifunctional  networks  to  us  implies  patterns  of  activity  and 
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behavioral  effects  that  can  lead  variably  from  one  effect  to  another  as  well  as  the 
generation  of  distinctly  different  behaviors. 


3.8  DEFINITION  OF  CONTEXTS  IN  GROUP  ACTION:  LINEAR  AND 
NONLINEAR  ORGANIZATION 

To  gain  some  perspective  on  our  definition  of  contexts  in  group  function,  the  above 
subsections  provide  some  of  the  necessary  background  on  what  we  mean  by  behav¬ 
ior  and  what  we  mean  by  nonlinear  dynamics  and  attractors,  different  modes  of 
cooperative  action,  and  optimization  and  its  relationship  to  different  forms  of  vari¬ 
ation  as  these  factors  play  on  attractors  and  on  turbulence-like  phenomena.  The 
discussion  has  introduced  the  importance  of  local  minima  and  error.  The  heart  of 
all  of  these  response  phenomena  lies  in  the  anatomy  of  convergence  and  divergence. 
It  is  easy  to  refer  to  behavior,  but  once  closely  examined,  we  have  realized  that  be¬ 
havior  may  not  be  as  definable  as  presumed,  though  we  do  not  deny  that  definable 
behaviors  do  exist. 

We  began  this  section  using  references  to  studies  that  have  considered  how 
distributed  interactions  among  neurons  lead  to  behavior,  and  which  have  proposed 
that  the  appropriate  behavior  arises  when  a  large  number  of  neurons,  or  perhaps 
all  or  most  of  them,  become  active.'*^'®®’®^’®^’^*®  This  is  part  of  what  we  mean  by 
“contexts”  and  “consensuses.”  Linear  summations  such  as  implied  by  “large  num¬ 
ber”  do  not  address  two  important  problems.  First,  if  attractors  or  other  nonlinear 
phenomena  arise,  it  is  not  necessary  for  the  majority,  or  a  large  number  of  neurons, 
to  become  active.  That  is,  coherent  activity  may  take  place  among  a  minority  of 
neurons,  but  if  the  coherence  is  strong  enough,  we  believe  that  its  effect  may  over¬ 
ride  activity  that  is  less  strongly  organized,  though  both  coherent  and  noncoherent 
activity  probably  adfect  the  actual  expression  of  the  resultant  behavior.  The  ques¬ 
tion,  then,  is  not  how  many  neurons  become  active  but  how  strong  the  coherent 
activity  is  above  a  “noise”  level.  Second,  even  if  the  interactions  are  linearly  related, 
or  if  robust,  stable  attractors  have  not  organized,  adaptive  responses  may  still  take 
place,  though  the  effect  may  not  be  as  strong  as  in  cases  when  the  majority  of 
neurons  act  together  or  when  there  ve  strong  attractors. 


4  BEHAVIORAL  AND  NEUROPHYSIOLOGICAL  FINDINGS  IN 
OTHER  ANIMALS 

4.1  INVERTEBRATES 

4.1.1  OVERVIEW  OF  MUL71FUNCTIONAUTY  AND  VARIABILITY  Taking  advantage 
of  well-defined  connections  between  four  identifiable  cells  in  the  bucc2Ll  ganglion 
of  Aplysia,  Gardner®®  has  shown  that  synaptic  effects  between  identified  neurons 
vary  widely  from  animal  to  animal.  Drawing  an  analogy  to  connectionist  neural 
networks,  Gardner  points  out  that  the  importance  of  a  network  is  not  so  much  in 
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what  its  synaptic  strengths  are  but  rather  in  what  the  set  of  synapses  together  can 
do  in  expressing  the  information  in  an  algorithmic  process.  The  difference  between 
biological  networks  and  neural  networks  is  that  the  temporal  interrelationships  in 
the  firing  of  neurons  may  shift,  and  that  the  same  network  may  be  able  to  generate 
different  patterns  of  activity. Thus,  in  Gardner  s  terms,  a  set  of  connections 
may  contain  the  information  for  many  different  algorithms.  Our  modification  to 
this  is  that  one  must  not  consider  the  algorithm  as  being  repeatedly  the  same;  i.e., 
the  algorithm  is  itself  variably  expressed. 

Recent  findings  in  the  sea  slug  Aplysia^^^'^^^  and  in  lobsters"®’^^  '®'^^'^®'^ 
are  consistent  with  the  notion  that  the  same  network  can  produce  activity  relating 
to  different  behaviors  (i.e.,  they  are  multifunctional),  as  is  the  work  on  yet  an¬ 
other  sea  slug  although  only  the  work  on  Aplysia  has  taken  notice 

of  variation.^^"*  An  important  paper  describes  leech  locomotion,  and  asks  what  it 
is  that  the  “central  pattern  generator”  really  mediates  since  a  variety  of  variable 
behaviors  were  observed.*^  Kien^®'^^’^"  has  published  a  series  of  insightful  papers 
on  locust  walking,  and  has  addressed  the  notion  of  variation  through  observations 
indicating  that  different  groups  of  neurons  become  active  to  produce  a  behavior. 
Variability  has  also  been  reported  in  walking  motor  patterns  in  cockroaches."'' 

By  the  late  1970s  the  notion  that  “hard  wired”  networks  can  explain  behavior 
had  received  strong  support  form  studies  on  genetically  inherited  ability  to  gener¬ 
ate  patterned  activity  in  a  many  animals.'^  Nonetheless,  ten  years  later,  Getting'^* 
voiced  the  following  interesting  conclusion  from  his  work  in  Tritonia,  '"Networks 
with  similar  connections  can  produce  dramatically  different  motor  patterns,  and, 
conversely,  similar  motor  patterns  can  be  produced  by  dramatically  different  net¬ 
works,"  just  as  one  can  read  from  the  work  in  Pleurobranchaea^^  that,  "Organized 
activity  emerges  or  self-organizes  such  that  different  contexts  of  the  same  coaciive 
neurons  become  involved  in  generating  the  same  or  different  motor  pattern."  Much 
evidence  in  neurobiology  has  shown  that  it  is  possible  to  ascribe  particular  func¬ 
tion  to  identified  neurons,  and  criteria  of  how  to  do  that  have  been  extensively 
discussed.‘'®’^°"'’^°^’^®°  Some  of  the  same  researchers  have  also  put  forth  the  con¬ 
trasting  notion  recently  that  conditions  might  exist  under  which  it  may  not  be 
possible  to  ascribe  function  to  particular  neurons.'”^ 

Thus,  although  the  classical  pers'^ective  still  seems  to  hold,  cind  much  evidence 
exists  to  support  it,  there  is  a  growing  awareness  of  alternative  possibilities.  Our 
feeling  is  that  it  may  be  diflScult  to  medce  direct  compairisons  between  animals, 
even  if  there  seem  to  be  many  similarities,  as  there  are,  for  example,  in  the  general 
neuroanatomical  features  of  the  nervous  systems  in  snails  and  slugs  indicating  that 
their  nervous  systems  contain  neurons  such  as  the  BCNs  in  Pleurobranchaea.  It  may 
be,  for  example,  that  feeding  systems  in  animals  that  evolved  to  utilize  relatively 
stable  and  predictable  food  sources  may  be  less  variable  than  ones  having  to  cope 
with  unpredictable  ones.  One  might  envision  such  a  comparison  between  certain 
herbivores  and  carnivores,  though  the  definipg  experiments  have  not  been  done. 
What  is  most  important  in  all  of  this  is  that  people  have  begun  to  ciddress  the  issues, 
and  quite  likely  the  most  illuminating  comparisons  will  be  ones  that  involve  different 
response  dynamics.  Our  bias  is  that  variation  should  be  a  common  observation.  In 
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cases  not  exhibiting  variation,  the  question  thcu  has  to  do  with  the  mechanisms 
that  control  variation. 

4.1.2  BIFURCATION  AND  RESPONSE  MODALITY  IN  THE  LOBSTER  STOMATOGAS- 
TRIC  SYSTEM  The  recent  discovery  of  the  ability  of  the  stomatogastric  ganglion  in 
lobsters  to  generate  different  behaviors^^’^®’^^  shows  clearly  that  one  must  not  as¬ 
sume  that  even  the  simplest  networks  produce  only  single  responses.  The  findings 
of  Cardi  et  al.^®  are  worth  casting  in  our  fraune  of  reference  relating  to  bifurca¬ 
tion.  The  stomatogastric  ganglion  in  lobsters  contains  a  subset  of  14  neurons  that 
comprise  the  pyloric  network  which  acts  as  a  central  pattern  generator.  Of  partic¬ 
ular  interest  in  this  network  is  a  further  subset  of  three  pacemaker  neurons  that 
form  the  oscillator.  Another  oscillator  lying  in  the  commissural  ganglion  sends  pro¬ 
jections  to  the  stomatogastric  ganglion.  By  using  sucrose-block  techniques  on  the 
nerve  interconnecting  the  two  ganglia,  it  was  possible  to  reversibly  interrupt  the 
connections  between  the  two  oscillators.  When  the  projections  were  blocked,  sys¬ 
tematic  injection  of  depolarizing  and  hyperpolarizing  current  into  one  of  the  three 
pyloric  pacemaker  neurons  resulted  in  continuous  variation  in  the  period  of  oscilla¬ 
tory  bursts  of  activity  in  the  pyloric  rhythm.  But  when  these  projections  were  not 
interrupted,  the  period  varied  discontinuously,  and,  for  some  ranges  of  the  injected 
current,  two  modes  of  oscillation  emerged  at  a  particular  level  of  injected  current. 
Overall  the  results  show  that  the  timing  between  the  two  oscillators  affected  the 
modes  of  integration  in  the  pyloric  network,  eind  that  the  commissural  projections 
also  exerted  neuron  dulatory  control  over  the  pyloric  network. 

There  are  two  ways  to  look  at  this  data.  The  first  is  that  there  is  some  reflex 
circuit  change  that  eilters  the  oscillations  in  the  pyloric  network  when  the  connection 
between  the  two  pattern  generators  is  inteict.  This  seems  restsonable  if  one  considers 
that  neuromodulation  may  be  capable  of  adjusting  which  neurons  participate  in  the 
oscillatory  interactions  or  their  interrelated  timing  (e.g.,  Marder“®’^^°’^^^).  Using 
John’s®^  terminology,  the  network  may  use  “switchboard”  factors  to  control  whc  ler 
the  network  produces  unimodal  or  bimodal  firing  in  its  burst  patterns. 

A  broader  perspective  holds  that  the  role  of  transmitters  and  modulator  is  to 
raise  the  network  closer  to  a  critical  point  for  bifurcation.  Small,  systematic  adjust¬ 
ments  in  the  current  injected  into  one  of  the  three  pattern-generating  neurons  push 
the  system  beyond  the  critical  point  allowing  the  network  as  a  whole  to  oscillate 
in  two  modes,  or  to  jump  discontinuously  from  one  period  to  another.  When  that 
transmitter  (or  transmitters)  is  not  present,  as  when  the  connections  between  the 
oscillators  are  interrupted,  the  system  settles  into  a  state  that  is  far  from  the  bi¬ 
furcation  point.  In  this  case,  no  amount  of  injected  current  will  push  the  network 
close  enough  to  the  critical  point  to  permit  bifurcation  to  take  place.  What  does 
happen  is  that  the  period  varies  continuously  as  a  function  in  the  strength  of  the 
injected  current.  This  is  precisely  what  happens  when  one  varies  the  bifurcation 
parameter  in  a  system  that  is  far  from  a  critical  point  (e.g.,  see  Thompson^®^  and 
Andrade®).  There  are  two  potential  bifurcation  parameters  in  the  study  of  Cardi  et 
al.^®  The  way  the  experiments  were  conducted  uses  the  polarization  state  (amount 
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of  injected  current)  of  one  of  the  pattern-generating  neurons  as  the  bifurcation  pa¬ 
rameter.  However,  if  there  were  sufficient  knowledge  of  the  cells  in  the  commissural 
ganglion  that  project  to  the  pyloric  ganglion,  their  level  of  firing  could  be  used 
as  the  bifurcation  p^arameter  for  each  level  of  applied  polarization  in  the  pattern- 
generating  neuron. 

The  advantage  of  using  bifurcation  analysis  may  not  be  appreciated  in  studies 
of  most  experimental  biological  systems  because  of  their  complexity  and  of  the  dif¬ 
ficulties  they  pose  in  permitting  selective  control  of  a  single  parameter.  The  utility 
of  the  analysis  becomes  more  obvious,  however,  in  computer  simulations.  Not  the 
least  utility  of  bifurcation  analysis  is  that  it  may  provide  some  predictability.  For 
example,  Feigenbaum"*^  observed  that  the  succession  of  period-doubling  bifurca¬ 
tions  occurs  in  a  universally  predictable  way.  The  ratio  of  differences  in  successive 
bifurcations  is  given  by  —  {fi,  — /z,+  i)/(/i,  +  i  —  pi, +2)^  where  pi  is  the  value  of  the 

bifurcation  parameter  in  the  sequence  of  bifurcations  from  ;  =  1 . oo.  For  many 

bifurcation  maps.  F,  quickly  converges  to  4.6692  to  the  fourth  decimal  place.  The 
pyloric  network  may  be  small  enough  to  permit  the  use  of  computational  methods. 
The  major  task  will  be  to  determine  what  parameter  to  control,  though  information 
from  neurohumoral  experiments  may  point  to  candidate  factors.  Different  bifurca¬ 
tion  states  may  use  the  underlying  network  architecture  in  different  ways.  The  way 
the  network  expresses  the-Various  firing  patterns  among  its  constituent  neurons 
is  not  predictable  from  knowledge  of  the  bifurcation  parameter  itself  nor  of  the 
anatomy  of  the  neuronal  connections.  Predictability  of  these  functional  or  emer¬ 
gent  networks  is  even  more  difficult  in  large  networks  or  if  variability  is  a  factor.  If 
there  are  many  weak  synapses,  there  may  be  insufficient  synaptic  power  to  control 
how  the  activity  traverses  the  connections  among  the  neurons.  Previous  activity  in 
the  network  may  alter  how  the  neurons  participate  in  the  future  to  produce  similar 
overall  patterns  of  activity.  Both  factors  have  been  observed  in  Pleurobranchaea}^^ 
and  may  affect  how  the  network  responds  during  bifurcation. 


4.2  MAMMALS 

"^he  importance  of  variation  in  brain  function  was,  to  our  knowledge,  noted  first  in 
mammalian  studies.  The  work  of  Adey  and  coworkers  (see  summary  in  Adey^),  done 
over  twenty  years  ago,  on  the  chimpanzee  and  human  electroencephalogram  (EEG), 
and  on  firing  of  cortical  neurons  in  cats,  clearly  expressed  the  need  to  consider  that 
noise  may  have  a  crucial  role  in  the  organization  of  brain  function.  Adey  noted 
that  while  information  must  be  contained  in  structure,  the  way  the  information  is 
expressed  quite  likely  is  not  obtainable  from  knowing  the  connections  of  structure 
itself.  At  about  the  same  time,  John*^^  discussed  the  problem  of  considering  cortical 
structure  as  statistical  rather  than  as  “switchboard”  circuits  that  can  be  deciphered 
simply  by  examining  the  connections.  The  ideas  expressed  by  Adey  and  John  were 
seminal  in  solidifying  reservations  in  our  own  laboratory  about  the  viability  of  as¬ 
cribing  whole-animal  behavioral  phenomena  to  simple  neurocircuits. Wetzel  and 
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Stuart^^^  clearly  favored  a  variable  neuronal  group  hypothesis  to  account  for  ver¬ 
tebrate  walking.  More  recently.  Braitenberg'*  examined  the  connectivity  of  visual 
cortex  and  suggested  that  activity  flowing  through  it  may  resemble  a  random  walk. 
Rapp  et  al.^^*^  analyzed  spontaneous  firing  in  cortical  neurons  and  suggested  that 
the  variations  observed  in  cortical  may  not  be  random,  but  rather  may  arise  from 
deterministic  low-dimensional  mechanisms  such  as  chaos.  X'ariation  appears  to  be 
an  important  avenue  for  self-organization  of  cooperative  activity  occurring  simul¬ 
taneously  over  the  entire  surface  of  olfactory  bulb.  Freeman  and  Skarda^‘“  have 
proposed  that  the  dynamical  state  of  the  bulb  shifts  from  chaotic  baseline  vari¬ 
ations  into  memory-specific  limit  cycles  that  are  evoked  when  the  animal  inhales 
odors. 

All  of  these  findings  are  consistent  with  our  own  findings  in  Pleurobranchaea. 
and,  in  turn,  our  findings  ?  iggesi  that  the  different  variational  types  may  provide  for 
response  optimization  into  different  attractors.  Although  the  work  in  Pleurobran- 
cfiaca  represents  the  first  demonstration  that  chaotic  activity  underlies  adaptive 
responses  in  animals,  it  is  noces,sary  to  take  the  evidence  extremely  cautiously,  as 
has  been  pointed  out.'‘^‘^  However,  to  the  extent  that  chaos  does  hold  to  be 

the  case  in  Pleurobranchaea.  and  in  the  various  observations  described  above  in 
mammals,  it  may  prove  a  general  principle  to  pursue  further  that  the  variations 
may  not  only  convey  information  for  a  behavior  but  also  may  provide  for  one  of 
the  methods  for  response  optimization  discussed  in  Section  3. 


4.3  DIVISIONS  OF  THE  MAMMALIAN  MOTOR  SYSTEM:  RELATIONSHIP  TO 
DIVERGENCE  AND  CONVERGENCE 

,VIamv  .lan  motor  behavior  may  be  classified  as  involving  the  pyramidal  system 
(PS)  or  the  extrapyramidal  system  (EPS).  According  to  the  classical  view,  execution 
of  all  voluntary  movement  in  mammals  is  initiated  by  motor  cortex  acting  through 
the  PS,  which  constitutes  a  two-iieuron  chain.  The  upper  motor  neuron  descends 
from  the  cortex  and  synapses  in  the  spinal  cord  with  the  lower  motor  neuron, 
which  innervates  the  muscle.  Going  backwards,  each  muscle  fiber  is  innervated  by  a 
single  lower  motor  neuron,  which  is  c^iniacted  by  only  a  few,  perhaps  a  single  upper 
motor  neuron.  So,  each  skeletal  muscle  of  the  body  has  a  topical  representation 
in  a  specific  zone  of  the  motor  cortex.  Stimulation  of  a  specific  region  results  in  a 
stereotype  response,  which,  if  the  stimulus  is  focal,  includes  one  muscle  fiber  only. 
A  given  cortical  neuron  can  act  in  two  different  states  depending  on  the  context 
defined  by  preceding  impulses  from  the  associative  cortex.^”*  This  seems  much  like 
a  switchboard,  showing  a  precise  structure-function  correspondence.  It  can  function 
as  such,  but  the  result  is  not  the  kind  of  movement  we  would  like  to  perform.  VVe 
get  an  idea  of  what  kind  of  movements  the  PS  can  produce  by  itself  by  watching 
patients  with  dysfunction  of  the  cerebellum  or  the  basal  ganglia,  as  in  the  case  of 
Parkinson’s  disease.  Their  movements  are  coarse,  as  if  the  limb-moving  is  not  quite 
sure  of  the  goal.  They  have  often  heavy  tremor,  suggesting  an  imbalance  of  muscular 
tone  at  rest.  Similar  imbalance  during  movement  is  indicated  by  rigidity,  suggesting 
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that  processing  of  the  sensory  information  about  continuously  altered  position  is 
not  occurring  fast  enough  or  precisely  enoueh.  We  might  say  that  tin'  PS  doi's  noi 
tolerate  nearly  as  much  error  ..s  the  KPS.  It  is  inii'resting  to  ('iiiptiasize  that  in  cas.s 
of  cerebellar  infarcts  or  in  Parkinson's  <lise;is('.  the  signal  cord  with  all  its  rellex  s 
is  supposed  to  b«'  intact  and  functioning  the  best  it  can  perform.  Therefore,  t  ae 
PS  may  exhibit  considerably  h'ss  convi'rgenct'  of  overlapping  information  and  less 
'iistributed  action.  TIu'  om'-to-oiu'  mafipmu  ilknvs  the  PS  to  <  xi'ciite  [irecise  control 
of  movement  but  may  'oake  it  extremely  i-rror  [iriJiK'  shotild  a  jvirticiiiar  i  ne  f.ul. 
whereas  the  KPS  may  exhibit  less  precise  control  yet  may  be  less  e  rror  prone'  wlie'n 
Its  compone'tits  fail. 

.\lthough  the  physiological  finding  that  mve'ii  muscular  response-s  can  onl>  be- 
(Obtained  by  stiinulatic/n  of  ci'rtain  cr.tical  lumrons  indicates  that  there  is  little' 
e'onvf'rgence,  histochemical  <iat:'  sie^gest  that  multiple  transmitter  sysK'rns.  [ire'- 
siimably  from  the'  KPS  and  s[)inal  corii.  l•|.:^.■e  rui'  opio  tin-  lowi'r  motor  ne'iireiii.  The 
^ubstanc('s  involve'd  inclinh'  e|opamin<',  neer.uiri'ii.aiiii''.  o  rotoiim.  liisttimine'.  mTii- 
slance  I’,  and  thyrotropin  re'h'tisinit  hormone-  (  THM).'^"  The  iq-epe-r  motor  ne-iiron 
snows  .some  degree'  of  dive'rgence'.  sima'  it.s  collaterals  contact  with  Kl’S  neurons 
and  spinal  cord  inle'rneurons  bi'fore'  synapsitig  with  the  'o\ver  motor  '  -  nron. 

Classically,  anything  regulating  mo»or  functions  etther  tha.n  the  i^S  is  define'd 
collectively  as  the  KPS.  It  iacludes  the  basal  ganglia,  the  ve'slibuiar  system,  and  the' 
cerebellum,  and  it  is  thought  le)  be'  re'sponsdde'  fetr  ceietrelinat ion  'if  tnove'ments.  Its 
components  conne'ct  indirectly  with  the  f’S  be)th  .at  cortical  and  spin.al  cord  lewe'ls. 
The  compone'nts  of  EPS  are  highly  interconnected,  although  the  precise  circuitry  is 
incompletely  knowm.  .a  high  ele'etre'e  of  conve'rge'iice'  ami  ilive'rgence-  are  likely  toeicrur 
in  the  KPS,  as  siiggesteel  by  the  morphole)gy  eif.  e  .g.,  the  e-e'rebe'llar  l-*urkitije'  i-e-lls 
.ind  b.asket  ce'lls.  By  contrast,  the  PS  has  significantly  feaver  coniie'ctions  among  its 
constituent  neurons. 

This  distinction  between  PS  and  KPS.  liewever.  may  not  he  immut.af'f  as 
in  iicatfd  by  motor  learning.  ( 'onsider  <a  musician  le'arning  a  new  pie'ce  eir  .a  :_ie'ur 
leearning  a  new  number.  Init  ally,  the  motor  pattern  is  c'-tablislied  under  cortical 
control.  This  .always  happens  relatively  slow'ly  and,  once  it  gets  fast  enough,  the 
cortex  cannot  handle  it  and  may  even  inhibit  tin.'  pattern.  Wh-'re  is  the  pattern 
transferred  to'.'’  It  must  be  some  sub<T)rtic.al  h'vel  that  takes  over  the  pattern.  .Ml 
we  know  is  tluit  the  contrc>l  h'vels  must  bt*  ah,)V('  thf'  lower  motor  tu'iiron.  which 
is  the  final  common  pathway  and  that  the  p.attern  must  he  processed  by  the  KPS. 
Control  c.in  Ix'  switched  back  and  forth  between  the  different  levels,  but  tl;e  f’S 
and  KPS  seem  almost  to  have  switched  their  functional  categorization.  To  be  siirt'. 
learning  may  model  KPS  to  conform  to  convergence  architectures  that  I'xhibit  less 
converg.'T'ce  and  v.ariation.  .as  discussed  below  in  relation  to  Kigiire  -f. 

The  diffuse  reticular  activating  system  (RAS)  is  perhaps  most  apropos  to  dis¬ 
cussions  of  convergence  and  divergence,  and  .adds  a  control  factor  that  must  be 
considered  with  all  somatic  motor  functions.  We  know  from  I'veryd.ay  experience 
that  rather  sophisticated  motor  activity  can  take  place  at  the  lowest  st.ates  of  ac¬ 
tivation  (sleepwalking)  or  rather  gross  errors  m.ay  occur,  if  the  st.ate  of  .activation 
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is  overly  high.  The  structure  classically  thought  to  be  related  to  the  state  of  cicti- 
vation  is  the  RAS  of  the  brain  stem.  Interestingly,  this  is  not  really  a  structure  in 
the  sense  of  the  nuclei  or  the  cortex.  Rather,  its  neurons  are  diffusely  spread  over 
a  large  proportion  of  the  brain  stem.  Considering  the  anatomical  fact  that  most  of 
the  vital  regulation  centers  are  located  in  that  region  over  a  very  small  space,  RAS 
must  be  in  contact  with  just  about  everything.  It  has  been  thought  that  RAS  con¬ 
trols  mainly  autonomic  vital  functions.  However,  it  has  turned  out  that  a  reticular 
system  is  found  all  over  the  spinal  cord  as  well.  So  it  is  reasonable  to  expect  that 
RAS  is  intimately  involved  with  motor  functions  too.  (Our  guess  is  that  the  RAS 
extends  over  all  the  cortex  as  well,  if  we  only  had  markers  to  identify  the  cell  types.) 
Thus,  a  better  understanding  of  differences  in  the  connectivity  and  function  of  the 
PS,  EPS,  and  RAS,  and  their  interactions,  may  shed  some  light  on  the  functional 
significance  of  convergence  and  divergence. 


5.  NEUROMODULATION 

5.1  CONVERGENCE  AND  DIVERGENCE  OF  NEUROTRANSMITTER 
SYSTEMS 

5.1.1  INVERTEBRATES  In  the  classic  view,  experimental  manipulation  of  individ¬ 
ual  neuromodulators  often  generates  predictable  effects,  as  has  long  been  demon¬ 
strated  in  other  animals.^*®''^'^  *^^  Our  own  work  began  with  a  similar  inten¬ 
tion;  to  identify  behavior-specific  neurotransmitter  evidence  relating  to  associative 
learning.  There  is  good  pharmacological  evidence  for  the  classically  defined  type  of 
cholinergic  muscarinic  receptors  (and  of  a  new  form)  in  Pleurobranchaea}^^  Behav¬ 
ioral  evidence  shows  that  muscarinic  receptors  have  a  role  in  associative  learning. 
Development  of  immunofluorescence  methods  for  detecting  the  transmitter  for  these 
receptors,  acetylcholine  (ACH),  has  allowed  us  to  identify  the  location  of  presynap- 
tic  cholinergic  neurons. Using  complete  serial  histological  sections  to  examine 
the  full  extent  of  the  projections  led  us  to  the  finding  that  we  should  have  expected 
from  our  physiological  work,  but,  interestingly,  we  did  not.  The  histology  showed 
that  a  relatively  few  cells  diverge  perfusely  throughout  the  nervous  system,  hardly 
leaving  any  portion  of  the  neuropil  untouched. 

This  led  us  to  examine  the  distribution  of  over  a  dozen  putative  neurotrans¬ 
mitters  in  complete  serial  sections  of  all  ganglia  in  both  Aplysia  and  Pleuro- 
hranchaea}^^'^^^'^^^  Examples  of  these  findings  are  shown  in  Figure  2  (A-F)  for 
Aplysia  and  in  Figure  2  (G-I)  for  Pleurobranchaea.  Each  transmitter  we  examined 
involved  a  few  neurons  that  diverged  and  converged  extensively  over  the  same  tar¬ 
get  areas  of  the  neuropil,  and  on  individucd  neurons.  The  alternative  possibility 
that  neurotransmitters  projected  selectively  onto  different  areas  was  seldom  seen. 
Our  present  working  hypothesis,  which  is  being  examined  physiologically,  is  that 
there  may  be  little  motor  specificity  in  the  projection  of  neuromodulators,  though 
there  may  be  differences  in  their  actions.  Recent  physiological  findings  in  Aplysta^^^ 
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support  this  hypotliesis  sinc<'  iiulividuai  lialh-applit'd  t ransiniiters  and  neuromoii- 
ulators  appear  to  alfect  all  motor  systems  examined. 


FIGURE  2  A-F;  Photomicrographs  of  the  neuropil  region  of  Aplysta  buccal  ganglion 
showing  immunoreactivity  for  (A)  histamine,  (B)  serotonin,  (C)  ACH,  (D)  GABA  (gamma- 
aminobutyric  acid),  (E)  VIP  (vasoactive  intestinal  peptide),  (F)  FMRFamide  (Phe-Met- 
Arg-Phe-NHo),  cross  in  (C)  indicates  immunoreactive  neuropil,  and  the  arrowhead 
shows  immunoreactive  terminals  around  nonreactive  neurons.  Bar  =  100  /tm  (A,D,E,F) 
or  50  /im  (B,C).  (G)-(l)  (now  labeled  (A)-(C):  will  be  changed):  Photomicrographs  of 
the  neuropil  region  of  Pleurohranchaea  buccal  ganglion  showing  immunoreactivity  for 
(G)  histamine,  (H)  GABA,  (I)  FMRFamide.  Bar  =  100/im.  Note  the  extensiveness  of 
the  immunoreactive  coverage  throughout  the  neuropil  in  all  tissues  from  both  animals. 
Positive  immunoreactivity  is  indicated  by  the  white  profiles  that  are  extensively  (cont’d.) 
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FIGURE  2  (cont'd.)  distributed  over  the  black  nonreactive  areas.  For  reference,  in 
Figure  2(l),  FMRF-amide  covers  the  entire  neuropil  of  the  buccal  ganglion.  The  large 
cell  at  the  right  is  the  buccal  giant,  and  the  commissure  leading  to  the  left  half  of  the 
buccal  ganglion  is  at  the  left  margin.  The  anterior  margin  of  the  ganglion  is  delineated 
by  the  row  of  dimly  stained  cells  at  the  top  of  the  micrograph,  and  the  posterior  margin 
is  shown  at  the  bottom  edge  of  the  neuropil.  The  area  between  the  neuropil  and  the 
row  of  dimly  stained  cells  contains  cell  bodies  which  are  not  seen  because  they  contain 
no  immunoreactivity.  Reprinted  with  permission  from  Biol.  Bull.  181  (1991);  484-499. 


Given  the  physiological  finding  of  the  e.xtensive  convergence  and  divergence  in 
Pleurohranchaea}^^  and  the  corollary  finding  in  Aplysta  that  sensory  stimulation 
activates  perhaps  the  majority  of  neurons  in  a  ganglion. the  interesting  possibility 
arises  that  conditions  may  often  arise  when  many  or  po.ssibIy  all  neurotransmitters 
may  become  active  at  the  same  time.  In  this  case,  the  classic  view  of  neuromod¬ 
ulation  that  has  been  generated  using  selective  applications  of  single  transmitters 
may  not  provide  adequate  insight  into  the  physiological  effects  produced  under  nor¬ 
mal  behavioral  conditions.  The  classic  view  comes,  we  believe,  dangerously  close 
to  making  an  unstated  assumption  that  the  effects  of  the  individual  transmitters 
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on  common  target,  neurons  sum  linearly.  But  if  conditions  arise  when  the  interac¬ 
tions  are  nonlinear,  the  cliissic  experimental  approach  provides  us  with  little  insight 
into  how  neuromo<iulaiion  .icts  lo  control  network  function  in  normally  behaving 
animals. 

5.1.2  VERTEBRATES  A  s  in  the  above  discussion,  we  provide  only  selected  exam¬ 
ples  here.  Extensive  iniu'rvation  by  nerve  libers  stainimi  for  a  large  number  of 
transmitters,  such  ;is  .ACH.  dopamine,  serotonin,  histamine.  CiAB.\.  taurine,  glu¬ 
tamate,  enkephalin,  angiotensin,  cholecyslokinin.  TRH.  and  vasoactive  intestinal 
polypeptide,  has  been  described  in  the  mammalian  striatum.  ’  Likewise,  multiple 
transmitters  (.ACII,  serotonin,  noradrenaline,  glutamate.  G.ABA)  have  been  local¬ 
ized  throughout  the  cerebellar  cortex.’”^  The  wulst  ("bulge')  is  a  structure  in  the 
avian  brain  that  resembles  the  mammalian  neocortex.  It  is  bipartite  and  runs  the 
length  of  the  dorsomedial  [lortion  of  the  liemispiiere.  .\  medial  [lortion  is  similar  to 
the  mammalian  hippocampus  (wulst  regio  hippcarnpalis.  Wrh).  and  a  lateral  por¬ 
tion  is  similar  to  regions  of  the  .somatosensory  neocortex  (wulst  regio  hyperstriatica. 
Whs).  Both  structures  are  laminated,  permitting  experiments  that  can  determine 
whether  neurotransmitters  are  differentially  distributed  between  and  within  lam¬ 
inae.  Shimizu  and  Karten'"’  examined  the  immunohistochemical  location  of  cell 
bodies  and  fibers  containing  serotonin.  .\(?H  (through  localization  of  choline  acetyl- 
transferase.  CHAT,  and  nicotinic  receptors.  n.ACliR).  .  atecholamine  (through 
localization  of  the  enzyme  tyrosine  hydroxylase).  GAB.A  (through  localization  of 
the  enzyme  glutamic  acid  decarboxylase.  CJAD.  and  the  G.-VBAa  receptor),  and 
the  neuropeptides  substance-P  (.SP).  leucine-enkephalin  (L-ENK).  neuropeptide  Y 
(NPY),  neurotensin  (.N'T),  somatostatin  releasing-inhibiting  factor  (SRIF),  corti¬ 
cotropin  releasing-factor  (CRF),  vasoactive  intestinal  polypeptide  (\’IP),  and  chole- 
cystokinin  (CCK).  .Although  these  substances  exhibited  Laminar  specificity,  evi¬ 
dence  was  obtained  showing  that  many  regions  of  the  Whs  contained  overlapping 
transmitters  and  neuromodulators.  For  example,  in  some  portions  of  a  large  region, 
the  hyperstriaticum  accessorium.  evidence  was  obtained  for  all  substances  e.xcept 
CCK,  though  the  density  of  distribution  for  each  substance  was  different. 

An  ideal  structure  to  use  for  such  purposes  in  vertebrate  animals  is  the  retina 
because  of  its  well-known  function  arid  neuroarchitecture,  and  the  ease  with  w'hich 
its  various  cell  types  can  be  identified.^®  Present  findings  indicate  that  many  neu¬ 
rotransmitters  and  neuromodulators  are  located  in  the  various  cells  of  the  retina,*^ 
but  the  methods  do  not  show  clearly  enough  how  much  divergence  and  convergence 
among  the  cells  in  the  retina  or  wulst,  and  how  much  occurs  from  the  retinal  gan¬ 
glion  cells  onto  other  brain  areas.  A  better  method  of  analysis  is  to  use  evidence 
from  the  location  and  distribution  of  transmitter  receptors.  Progress  in  the  labora¬ 
tory  of  Professor  Harvey  J.  Karten^®  at  the  Department  of  Neuroscience,  University 
of  Ceilifornia  at  San  Diego,  indicates  that  individual  retinal  cells  contain  receptors 
for  many  different  neurohumoral  factors,  and  that  many  cells  stain  for  the  same 
receptors,  indicating  that  there  is  extensive  convergence  and  divergence  of  neu¬ 
rotransmission  and  neuromodulation.  Because  of  its  experimental  approachability 
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and  well-known  function,  the  retina  may  provide  a  rich  experimental  source  for  un¬ 
derstanding  how  multiple  converging  factors  interact  to  control  neuronal  function. 

In  human  physiology,  Parkinson  s  disease  is  probably  the  best-known  example 
of  a  transmitter-specific  defect  in  human  motor  function.  Its  cause  is  considered 
to  be  a  decrease  in  the  activity  of  the  dopaminergic  nigrostriatal  tract.  Clinical 
neurology  has  established  that  when  the  amount  of  dopamine  is  too  low,  the  action 
of  the  dopamine  antagonist,  the  cholinergic  system  of  the  basal  ganglia,  becomes 
too  strong.  The  treatment.  I-dopa.  increases  dopamine  levels  to  retain  the  balance 
between  the  two  systems.  However,  there  is  nothing  in  here  to  prove  that  the  action 
of  the  dopamine-ACH  system  is  necessarily  based  on  fixed  circuits  and  that  it  acts 
individually  in  normal  brain  function,  .\lthough  dopamine  is  found  in  a  specific 
tract,  we  do  not  know  how  much  divergence  or  convergence  is  involved  in  that 
system,  and  what  the  effects  may  be  when  many  neurons  and  transmitters  act 
together. 

Although  the  pituitary  is  not  a  classically  definable  motor  organ,  it  provides  an 
excellent  example  of  multi-humoral  control.  The  intermediate  lobe  is  a  morpholog¬ 
ically  homogeneous  group  of  cells  that  all  contain  the  same  hormones,  melanocyte- 
stimulating  hormone  and  beta-endorphin.  The  question  is  why  are  so  many  different 
transmitters  needed  for  the  simple  regulation  of  inhibition-excitation.  Stimulatory 
(serotonin.  ACH)  and  inhibitory  (dopamine,  opioids,  probably  GABA)  actions  have 
been  described  for  one  substance  at  a  time,  but  we  have  no  idea  how  these  sub¬ 
stances  act  together.  Since  the  output  is  so  simple  and  easily  measurable  (hormone 
secretion),  this  tissue  may  provide  a  model  to  study  the  implications  of  divergence 
and  convergence  of  multiple  neurotransmitter  inputs. 

Figure  3  summarizes  .some  of  our  findings  in  rat  pituitary.  The  data  clearly 
support  the  possibility  of  high  convergence  onto  the  same  target  areas,  but  since 
there  is  presently  no  morphometric  evidence  of  how  many  neurons  provide  the 
innervation,  we  cannot  presently  provide  an  estimate  of  the  ratios  of  convergence 
and  divergence.  The  pituitary  is  particularly  interesting  since  the  output  of  the 
system  in  response  to  converging  actions  is  neurohumoral  rather  than  electrical. 

In  conclusion,  we  suggest  that  the  properties  of  nonlinearity,  distributed  func¬ 
tion,  variability,  multifunctionality,  convergence/divergence,  and  the  likelihood  that 
the  system  is  error-prone,  all  of  whfeh  we  have  attributed  to  the  electrical  neuro¬ 
circuit,  may  also  be  ascribable  to  neuromodulation.  It  may  be  possible  to  obtain 
repeatable  effects  when  controlling  certain  transmitters,  but  what  the  effects  may 
be  or  how  to  conceptualize  the  interaction  of  many  transmitters  (acting  at  very 
low  concentrations)  is  presently  unclear.  If  the  dynamics  of  target  processes  (elec¬ 
trical  or  chemical)  are  far  from  bifurcation  points,  the  nonlinearilies  (or  any  effect) 
may  not  be  observable.  But  given  that  the  bifurcation  points  are  accessible,  the 
number  of  possible  effects  arising  from  electrical  nonlinearities  and  from  the  effects 
of  transmitters,  cotransmitters,  and  neurohormones  become  enormous.  If  we  are 
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FIGURE  3  PhotomicrograpFTof  rat  pituitary,  al:  Anterior  lobe.  ii.  'ntermediate 
lobe,  pi;  posterior  lobe.  (A)  Acetylcholine.  (B)  MEAGL  (Met'^-enkephalin-ARG'’- 
GLY^-LEU^.  (C)  Serotonin.  (D)  GABA.  (E)  Tyrosine  hydroxylase,  the  dopamine- 
synthesizine  enzyme.  Note  convergence  of  these  substances  onto  similar  areas  of  the 
intermediate  and  posterior  lobes,  as  shown  in  Figure  2  for  neural  tissues  of  Aplysta 
and  Pleurobranchaea. 


to  believe  that  neurohumoral  agents  act  variably  and  in  concert,  then  we  must 
envision  further  that  the  subcellular  mechanisms  that  each  of  these  receptors  and 
channels  activates,  may  lead  to  converging  and  diverging  nonlinear  actions  within 
the  cell  itself.  Thus,  it  is  conceivable  that  the  clarity  of  the  mechanisms  presented 
for  a  single  neurotransmitter  or  a  single  second-messenger  system  may  be  somewhat 
misleading.  The  point  that  needs  to  be  examined  further  is  that  there  may  be  many 
different  sites  of  converging  interactions  in  biological  systems  that  process  the  same 
information  in  parallel,  and  perhaps  in  different  ways,  but  may  be  capable  of  sharing 
the  results  of  such  processing.  Thus,  systems  may  exist  in  which  it  may  not  be 
possible  to  ascribe  unique  function  to  any  motor,  cellular,  or  subcellular  process. 
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6.  REDUCTION  AND  EMERGENCE  IN  CONTROL 
MECHANISMS 

How  are  these  widely  distributed  physiological  and  neurohumoral  processes  con¬ 
trolled?  We  suggest  that  many  are  not,  at  least  not  explicitly.  It  would  be  too 
costly,  for  the  same  reasons  that  it  would  be  too  costly  to  devise  neurocircuits  for 
each  behavior.  It  seems  better  to  allow  the  system  to  be  error-prone.  As  discussed 
in  studies  on  Pleurobrauchaea.^'^'^  some  looseness  may  actually  be  beneficial 

since  systems  needing  to  be  highly  tuned  to  specific  tasks  may  prove  to  be  brittle  in 
variable,  unpredictable  <'nvironments.  Put  differently,  it  seems  better  to  allow  the 
interaction  between  the  organism  and  the  environment  to  determine  the  behavior 
than  to  “hard  wire"  encode  all  of  the  behaviors  that  an  animal  can  perform. 


6.1  TRANSMITTERS  CONTROL  NETWORK  FUNCTION  AND 
ARCHITECTURE 

There  are,  of  course,  demonstrable  control  mechanisms  that  we  need  to  remember 
that  show  hard-wiring.  For  example,  as  we  have  mentioned  previously,  it  has  been 
shown  that  selective  application  of  neurotransmitters  evokes  different  patterns  of 
activity  in  simple  ganglia,'^®  "*  *  just  as  there  is  a  vast  textbook  literature  showing 
evidence  of  the  classical  "neurocircuit. Most  published  evidence  weighs  heavily 
in  this  direction.  Thus,  good  evidence  exists  to  show  that  “Each  neurotransmitter 
or  neurotransmitter  system  may.,  .be  able  to  elicit,  from  the  same  neuronal  circuit, 
a  characteristic  and  different  'operational  state.  '  In  this  way  it  would  be  possible  to 
obtain  a  wide  range  of  stable  neuronal  outputs  from  a  single  circuit.'' 

A  remarkable  series  of  experiments  by  Kater  and  coworkers  (e.g.,  Kater  and 
Mills®^  and  Lipton  and  Kater' begun  initially  in  the  fresh  water  snail  Heli- 
soma  and  now  extended  to  mammalian  neural  tissues,  shows  the  ability  of  trans¬ 
mitter  receptors  to  control  neuronal  growth,  plasticity,  and  even  survival  of  neu¬ 
rons.  The  work  has  examined  a  spectrum  of  neurotransmitters  and  neuromodu¬ 
lators,  including  ACH,  GABA,  dopamine,  glutamate,  norepinepherin.  serotonin, 
somatostatin,  and  VIP.  Taking  advantage  of  cell  culture  of  identified  neurons,  the 
work  has  been  able  to  provide  a  strong  bavsis  of  control  experiments.  As  one  exam¬ 
ple  in  Helisoma,  serotonin,  an  excitatory  transmitter  in  this  system,  retards  neu- 
rite  outgrowth  whereas  the  addition  of  ACH,  an  inhibitory  transmitter,  prevents 
the  serotonin-induced  inhibition.  The  transmitters  work  through  the  depolariza¬ 
tion  state  of  the  cell.  For  example,  presenting  an  excitatory  transmitter  alone  re¬ 
tards  the  normal  neurite  outgrowth,  but  superimposing  hyperpolarizing  current  on 
transmitter-induced  excitation  allows  the  neurite  to  resume  its  normal  growth  rate. 
The  transmitters  may  act  either  through  voltage-  or  receptor-activated  channels 
on  a  common  intracellular  messenger,  calcium.  As  Lipton  and  Kater"*  summa¬ 
rize,  neuronal  architectures  (and  therefore  neurocircuits)  are  determined  by  a  fine 
balance  in  the  activation  of  these  two  types  of  channels  through  an  interplay  of 
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excitatory  and  inhibitory  transmitters  (though  different  mechanisms  may  be  used 
in  other  neural  systems;  see  Garyantes'’^ ). 

The  term  "balance"  clearly  indicates  that  Lipton  and  Kater  are  aware  that  con¬ 
trol  in  natural  biological  systems  may  be  high-dimensional  since  neural  tissues  are 
known  to  contain  many  transmitters.  The  problem,  then,  is  to  determine  how  the 
high  dimensionality  is  expressed.  One  possibility  is  that  there  is  simple  linear  sum¬ 
mation  of  the  effects  produced  by  the  various  transmitter  However,  it  is  well  known 
that  the  electrogenic  properties  of  the  postsynaptic  cell  can  easily  change  a  simple 
synaptic  input  into  a  nonlinear  response.  Twenty  years  ago.  Wilson  and  (’owarT 
conducted  computer  simulations  on  a  population  model  to  illustrate  that  groups 
of  cells  intercommunicating  through  excitatory  and  inhibitory  connections  exhibit 
damped  oscillations,  multiple  stable  states,  and.  under  certain  constraints,  stable 
limit-cycle  oscillations  in  the  number  of  excitatory  and  inhibitory  neurons  firing  per 
unit  time.  A  rather  interesting  feature  of  the  model  is  that  local  interactions  were 
essentially  random,  yet  the  long-range  effects  were  (]uite  organized.  Another  mt<'r- 
esting  feature  of  the  model  that  is  pertinent  to  the  present  discussion  is  that  the 
population  of  excitatory  and  inhibitory  cells  were  homogeneous:  differences  arose 
statistically  through  use  and  refractory  period.  In  even  simpler  networks  involving 
one-shot  activation  between  converging  inputs  to  a  common  neuron  can  lead  to 
linear  and  nonlinear  effects-in  the  postsynaptic  cell.^  In  single  neurons,  it  may  be 
possible  to  generate  many  different  periodic  and  aperiodic  firing  patterns  by  means 
of  fine  adjustments  to  a  single  ion  channel.'®  This  latter  study  also  showed  that 
intracellular  calcium  concentration  may  fluctuate  differentially  and  nonlinearly  in 
each  dynamical  state.  Therefore,  the  controlling  balance  between  converging  trans¬ 
mitters  and  neuromodulators  that  affect  neuronal  structure  need  not  be  a  simple 
linear  affair.  What  may  seem  a  linear  balance,  under  some  parameter  ranges  of  the 
neurohumoral  state,  can  easily  switch  to  drastically  different  conditions  at  critical 
bifurcation  conditions. 

The  dynamics  of  interactions  arising  in  population  of  cells  need  not  employ  the 
full  high-dimensional  space.  Going  back  to  our  notion  of  attractors,  the  differi  iii 
dynamics  that  a  network  will  allow  determine  the  characteristics  of  temporal  visita¬ 
tion  of  activity  at  any  given  neuron  in  the  coactive  group;  i.e..  a  set  of  connections 
will  be  activated  differently  by  the  types  of  attractors  that  it  can  sustain,  .\lthough 
a  developing  network  at  some  primitive  state  may  exhibit  different  dynamical  capa¬ 
bilities  than  a  finely  tuned,  mature  one,  the  same  questions  c;  rtlinear  conditions 
arise  in  both.  Finally,  if  attractors  arise  either  in  the  responst  i.gle  neurons  or 

in  networks  of  them,  the  high-dimensionality  we  see  in  the  number  of  transmit t('rs 
present  may  not  necessarily  be  expressed  as  a  high-dimensional  process.  It  is  an  in¬ 
teresting  possibility,  raised  by  numerical  studies,  that  coordinated  activity  in  poten¬ 
tially  high-dimensional  systems  often  results  in  low-dimensional  attractors.'^'^  ' '  ® 
From  a  simple  listing  of  the  number  of  transmitter  resulting  from  experiments  in 
which  transmitters  are  applied  one  at  time  ojr  in  pairs,  it  is  not  evident  how  the 
system  dynamically  collapses  into  low-dimensional  control,  and  which  of  the  trans¬ 
mitters  become  involved.  Even  in  small  model  networks  in  which  all  of  the  driving 
differential  equations  are  known,  it  is  not  obvious  from  the  equations  themselves. 
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nor  presently  from  the  connectivity,  how  it  is  that  a  lower  dimensionality  au'ises 
from  a  larger  possible  set  of  available  variables  unless  the  system  is  examined  after 
activating  it.® 

Given  a  linear  system,  it  may  be  possible  to  say  that  neurotransmitters  are 
architects  of  neural  structure.  But,  as  we  shall  discuss  later  in  Section  8  when  dealing 
with  bifurcation  in  minimal  networks,  conditions  may  arise  when  the  activity  itself 
is  what  fine  tunes  a  network,  and  in  turn,  the  network  redefines  the  type  of  activity 
that  can  emerge.  There  is  a  dialectical  interplay  between  the  two  elements,  and  this 
dialect,  we  believe,  can  act  as  an  architect  of  neurons  and  circuits.  The  chaun  of 
events  that  we  might  envision  of  the  events  that  control  cell  structure  is  as  follows; 
The  dynamics  of  firing  in  individual  neurons  aind  in  networks  of  them  acts  on 
structure  through  transmitters;  the  tramsmitters  act  on  the  cell  through  calcium. 
The  dynamics  of  changes  in  intracellular  calcium  sets  up  a  chain  of  events  that 
affect  cell  growth.  But  cell  growth  redetermines  what  the  dynamics  will  be,  and  so 
forth  recursively.  Other  factors  may  contribute,  such  as  synaptic  competition.  If  the 
notion  that  many  neurons  act  in  close  temporal  association,  or  in  coordination,  is 
correct,  we  must  then  add  the  complication  that  the  system  as  a  whole  is  extremely 
.“.igh  dimensional  and  that  many  types  of  nonlinearities  may  occur.  As  we  shall 
speak  below  of  the  locus  of  learning,  there  may  be  no  sine  qua  non  balance  of 
neurohumoral  agents  for.a  given  architecture  to  appear.  Although  there  may  be 
many  systems  in  which  there  is  always  a  precise  connection  between  a  balance 
between  a  particular  set  of  chemiceil  elements  and  structure,  understanding  these 
systems  gives  little  insight  into  others  in  which  variability  is  an  issue. 

Thus,  while  the  scientific  method  at  our  disposal  provides  elegant  connections 
‘'etween  cause  and  effect,  much  as  Descartes  and  Euclid  would  like  us  to  believe, 

oossibility  of  high-dimensional  space,  of  nonlinearities,  and  of  the  dialectic  be- 
'  en  structure  and  dynamics  indicate  that  our  view  of  complex  systems  may  be 
too  simple.  However,  the  scientific  methods,  as  they  are,  are  nonetheless  the  only 
ones  we  have.  Therefore,  our  concern  is  not  that  the  methods  and  conclusions  are 
simplistic  but  rather  it  is  that  they  do  not  address  fundamental  questions  that  need 
to  be  asked.  Moreover,  the  clarity  of  some  of  these  reductionistic  methods  and  the 
importance  of  the  resulting  findings  have  overshadowed  the  need  to  go  beyond  them 
and  to  develop  methods  of  data  collection  that  may  be  useful  in  taking  that  step. 

6.2  CONTROL  OF  WHOLE-ANIMAL  BEHAVIOR:  CRITIQUE  OF 
REDUCTIONIST  EXPLANATION  OF  LEARNING  IN  APLYSIA 

6.2.1  SYNAPSE-SPECIFIC  CONTROL  OF  BEHAVIOR  A  tradition  in  invertebrate 
neurobiology  holds  that  an  advantage  of  using  invertebrate  animals  is  that  once 
a  behavior  is  identified  with  a  particular  motor  pattern,  the  same  behavior  can 
then  be  studied  neurophysiologically  in  the  motor  patterns  of  isolated  nervous 
systems.  As  discussed  briefly  in  subsection  3.3,  this  is  quite  difficult  to  do  in 
Pleurohranchata}^  However,  the  most  elegant  example  of  such  reductionist  ap¬ 
proaches  has  been  the  identification  of  site-specific  learning  in  the  gill-withdrawal 
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response  in  A  long  series  of  studies  have  attempted  to  show  how 

changes  at  monosynaptic  sites  between  sensory  neurons  and  motor  neurons  can 
explain  whole-animal  phenomena  such  as  sensitization,  dishabituation,  and  asso¬ 
ciative  learning.  The  mechanism  involves  serotonin  as  a  neurotransmitter  in  the 
reinforcing  pathway.  The  original  series  of  experiments  showed  that  activation  of 
serotonin  receptors  on  sensory  neurons  leads  to  a  chain  of  events  involving  adenosin 
3',5'-monophosphate  (cyclic  AMP)  that  depress  a  potassium  current  when  the  cell 
fires.  This  exposes  an  inward  calcium  current  that  broadens  the  action  potential, 
and,  owing  to  the  increase  in  intracellular  calcium,  leads  to  increased  transmitter 
release  onto  the  follower  motor  neuron.  A  group  of  sensory  cells,  referred  to  as  the 
LE-neurons,  which  are  usually  activated  electrically  in  isolated  ganglia,  provides 
the  input  to  identified  motor  neurons  of  which  neuron  Lj  is  perhaps  the  most  im¬ 
portant  in  terms  of  its  effect  on  the  movement  of  the  gill.  A  group  of  cells,  referred 
to  as  L29,  provides  the  serotonergic  input. 

6.2.2  COMPLICATIONS  A  number  of  important  extensions  and  problems  have  arisen 
that  both  greatly  illuminate  and  complicate  this  simple  model  system.  We  cite  only 
a  few  examples; 

1.  Peripheral  nervous  system.  From  the  beginning  of  work  in  the  late  1960s,  ev¬ 
idence  has  existed  indicating  that  emergent  effects  may  involve  the  peripheral 
nervous  system  which  is  distributed  within  the  gill  itself.  Indeed,  in  many  cases 
the  abdominal  ganglion  seems  not  to  be  necessary  for  generating  robust  gill 
withdrawal  responses  and  simple  forms  of  learning.^'*® 

2.  Complex  behavior.  The  once-presumed  simple  withdrawal  reflex  has  turned  out 
not  to  be  so  simple,  and  consists  of  several  different  types  of  movements. 

3.  Neuronal  function.  Some  of  the  major  identifiable  motor  neurons  have  variable 
function  within  the  same  experimental  preparation  within  the  same  behavior. 
This  raises  strong  questions  in  Aplysia  as  to  the  veracity  of  assuming  that  iden¬ 
tified  neurons  have  consistently  the  S2une  role  in  a  given  behavior,  much  as 
Mpitsos  and  Cohan^^®  have  reused  regarding  the  function  of  neurons  in  Pleu- 
rohranchaea. 

4.  Complex  network.  Small,  well-localized  sensory  taps  activate  perhaps  half  of 
the  cells  in  the  abdominal  ganglion,  showing  that  there  is  extensive  divergence 
of  sensory  and  possibly  other  effects.*®® 

5.  Non-constant  activity.  Cells  partaking  in  successive  taps  are  variable,*®'*  sug¬ 
gesting  that  localization  of  the  network  may  be  difficult  or  impossible. 

6.  Source  of  serotonergic  control  is  unidentified.  Activating  L29  produces  enhanced 
transmitter  release.  Serotonin  applied  experimentally  produces  same  effect.  But 
L29,  which  was  thought  to  provide  the  serotonergic  enhancement,  apparently 
does  not  contain  serotonin.®*’*®® 

7.  Multiple  neurohumoral  factors  enhance  synaptic  release.  We  now  know  that 
at  least  two  other  transmitters,  small  cardioactive  peptide  A  and  B  (SCPa, 
SCPb),  broaden  action  potentials  in  LE  cells  and  produce  synaptic  facilitation 
on  their  follower  motor  neurons,®  but  apparently  they  are  not  located  in  Z/29.*®^ 
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Interestingly,  SCPb  produces  spike  broadening  but  not  facilitation  of  transmit¬ 
ter  release  in  depressed  sensory  neurons,*^®  which  may  relate  to  mobilization 
of  transmitter. 

8.  Multiple  suhcellular  processes.  There  may  be  diverging  cyclic  AMP-dependent 
processes  in  different  forms  of  synaptic  facilitation.®*^  Conversely,  in  both  the 
gill-withdrawal  system  and  the  analogous  tail-withdrawal  system,  cyclic  AMP- 
dependent  and  cyclic  AMP-independent  subcellular  processes  may  converge 
onto  the  same  spike-broadening  mechanisms  in  both  the  gill-^®  and  tail-sensory 
neurons.^ 

9.  More  than  one  group  of  sensory  inputs.  The  possibility  has  been  raised  that  un¬ 
der  some  conditions,  novel  sensory  neurons  may  be  involved  in  modification  of 
a  siphon  withdrawal  response  whose  behavioral  modification  has  been  thought 
to  be  controlled  by  changes  in  the  LE  sensory  neurons. 

10.  LE  cell  activity  lacks  timing  to  be  primary  site  of  learning.  Most  importantly, 
it  now  appears  that  there  is  a  second  group  of  sensory  cells  that  have  lower 
thresholds  than  the  LE  cells,^®  and  are  probably  more  likely  to  activate  than 
the  LE  cells  during  training  of  the  gill  withdrawal  response  itself.  It  has  now 
been  reported^®  that  the  latency  of  responses  in  mechanoactivated  LE  cells  in 
all  of  the  32  preparations  that  were  tested  always  occurred  after  the  initiation 
of  the  discharge  in  the  motor  neurons.  Their  timing  in  the  behavioral  reflex 
has  been  diflicult  to  determine.^'*  The  problem,  then,  is  if  the  cellular  basts  of 
behavior  relies  on  the  LE  cells  as  the  site  of  facilitated  transmitter  release,  the 
responses  of  the  LE  cells  must  occur  before  the  initiation  of  motor  output  for 
that  behavior,  but  the  recent  findings  show  clearly  that  they  do  not. 


6.3  EMERGENT  CONTROL  OF  APLYSIA  BEHAVIOR:  PARALLEL 
DISTRIBUTED  PROCESSING 

6.3.1  DONT  WORRY.  BE  HAPPY;  NEW  SYNTHESIS  It  might  be  tempting  to  some 
interpreters  of  the  above-mentioned  complications  in  Aplysia  to  disparage  the  orig¬ 
inal  conclusions  about  site-specific  learning.  We  believe,  however,  that  that  would 
be  a  mistake.  To  dismiss  the  original*conclusions  would  be  to  fall  to  the  temptation 
that  has  faced  previous  work  on  learning  in  Aplysia,  and  of  most  such  attempts  in 
other  animals,  that  there  is,  in  fact,  some  other  reducible  locus  of  learning,  or  some 
reducibly  identifiable  neurocircuit  as  the  generator  of  behavior.  But  by  maki  .  the 
dismissal,  one  would  miss  the  more  important  issue  that  emerges  from  the  findings, 
namely,  that  the  data  may  be  influential  in  redirecting  the  focus  from  reductionism 
to  a  higher  level  of  analysis.  It  is  not  just  that  behavior  may  be  different  on  different 
occasions.  A  general  scheme  appears  to  have  emerged  in  all  of  the  work  on  Aplysia 
that  is  not  inconsistent  with  the  findings  we  have  obtained  in  our  attempts  to  un¬ 
derstand  the  integrative  processes  that  generate  behaviors  in  Pleurobranchaea.  This 
scheme  relates  to  our  discussion  above  of  parallel  processing  arising  from  the  exten¬ 
sive  distribution  and  sharing  of  information,  as  we  summarize  below  in  subsections 

5.3.2  and  5.3.3. 
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6.3.2  THE  LOCUS  OF  LEARNING  MAY  NOT  BE  AT  A  UNIQUE  CELLULAR  SITE  The 

evidence  cited  in  the  above  list  of  complications  may  be  reinterpreted  as  in  the 
following  general  scheme:  Different  sites  in  the  nervous  system  are  capable  of  gen¬ 
erating  similar  components  of  the  same  behavior,  and  each  site  is  capable  of  af¬ 
fecting  the  other:  i.e.,  there  is  apparently  extensive  convergence  and  divergence 
between  different  sensory  and  motor  centers.  Within  a  given  sensory-motcr  system, 
divergence  is  an  inherent  effect  of  even  small,  highly  localized  stimulation.  At  the 
same  time,  different  sensory  pathways  converge  on  the  same  motor  neurons.  Similar 
convergence  occurs  among  neurohumoral  systems  and  their  subcellular  effects. 
Thus,  mounting  evidence  indicates  a  cascade  of  diverging  and  converging  chemical 
interactions  that  distribute  sensory  and  motor  effects  widely. 

Evidence  exists  that  supports  these  possibilities.  For  example,  we  know  that 
weak,  highly  localized  tactile  stimulations,  as  used  in  training  experiments  to  show 
learning,  activates  large  numbers  of  neurons, i.e.,  that  divergence  distributes  in¬ 
formation  over  many  cellular  loci.  We  also  know  that  learning  occurs  in  both  the 
peripheral  and  central  components  of  the  nervous  system  of  Aplysta  (see  review 
in  Mpitsos  and  Lukowiak^"*®).  We  also  know  from  studies  in  isolated  nervous  sys¬ 
tems  and  from  more  intact  preparations  that  conditioning-related  changes  occur  on 
LE  sensory  neurons  that  synapse  on  different  gill  motor  neurons.  Training-induced 
changes  may  occur  at  the:»euromuscular  junction.*^*  Additionally,  changes  may  oc¬ 
cur  during  training  that  follow  all  of  the  criteria  established  for  associative  learning 
but  which  do  not  take  place  between  the  sensory  neurons  and  their  follower  neurons. 
For  example.  Lukowiak  and  Colebrook^'^  have  obtained  evidence  of  associative 
conditioning  that  excludes  the  major  gill  motor  neurons.  The  conditioned  stimulus 
(CS)  consisted  of  weak  tactile  stimulation  of  the  siphon  skin.  The  unconditioned 
stimulus  (UCS),  in  one  set  of  experiments,  consisted  of  strong  electrical  stimulation 
of  the  pedal  nerve  which  connects  the  brain  with  the  foot,  and  in  another  set  of 
experiments,  it  consisted  of  strong  tactile  stimuli  to  the  gill  itself.  During  training, 
dual  intracellular  recordings  were  made  from  sensory  neurons  and  major  identifiable 
gill  motor  neurons  {Lr,  LDGi,  LDG2,  T9).  The  movement  of  the  gill  itself  was  also 
monitored.  In  the  course  of  training,  the  CS  produced  gill-withdrawal  movements 
that  increased  as  a  function  of  the  number  of  training  trials,  and  the  efficiency  of 
the  sensory-to-motor  neuron  synapses  increased.  Appropriate  control  experiments 
showed  that  the  effects  were  consistent  with  associative  conditioning.  However,  the 
number  of  action  potentials  produced  in  the  motor  neuron  in  response  to  the  CS 
correlated  well  with  the  actual  movement  of  the  gill  only  during  the  initial  stages 
of  training.  But  most  of  the  amplitude  changes  in  the  gill-withdrawal  response  was 
not  correlated  with  any  changes  in  the  number  of  action  potentials  generated  in  the 
motor  neurons.  In  another  set  of  experiments,  designed  to  mimic  associative  learn¬ 
ing  observed  whole-animal  studies,  evidence  was  obtained  for  associative  learning 
in  a  significant  number  of  reduced  preparations  in  which  there  was  an  increase  in 
the  number  of  action  potentials  produced  in  the  motor  neurons,  but  there  was  no 
change  in  the  amplitude  of  the  gill-withdrawal  response. 

Findings  such  as  these  show  that  associative  learning,  and  simpler  forms  of 
learning  such  as  sensitization  and  habituation,  may  take  place  at  mainy  different 
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loci.  Thus,  as  regards  complication  10  noted  above,  it  is  not  too  big  a  jump  '  o  realize 
that  learning  could  also  happen  in  classes  of  sensory  neurons  other  than  the  LE  cells, 
and  eventually  to  discover  that  learning- related  physiological  changes  may  also  be 
shown  postsynaptically  in  the  motor  neurons  themselves,  not  just  presy  nap  tic  ally  in 
t!'e  sensory  neurons.  Additionally,  as  Mpitsos  et  al.'"*^  have  pointed  out  in  detailed 
control  studies  of  associative  learning  in  Pleurobranchaea.  let  us  not  be  wedded 
dogmatically  to  a  definition  of  associative  learning  that  forces  physiology  to  comply 
with  a  particular  protocol  of  stimulus  presentations  applied  by  the  experimenter 
to  whole  animals.  Single-trial  training  in  this  study  showed,  for  short  intervals 
between  CS  and  the  UCS.  that  backward  conditioning  p  iced  almost  as  strong 
conditioning  as  forward  conditioning.  .Mpitsos  et  al.  point.  .  out  that  what  may  be 
temporally  controllable  experimentally  in  the  application  of  sensory  inputs  may  not 
hold  physiologically.  The  same  set  of  subcellular  mechanisms  producing  learning- 
related  changes  in  forward  between  the  CS  and  UCS  (which  is  required  by  the 
definition  of  associative  learning)  may  exist  to  some  extent  when  the  stimuli  are 
presented  in  close  temporal  pairing  but  in  reverse  order.  us.  changes  arising 
from  both  the  forward  and  backward  temporal  relationships  between  the  CS  and 
UCS  can  represent  associative  learning  (though  this  does  not  e.xclude  arguments 
for  different  mechanisms,  should  they  occur,  to  account  for  backward  conditioning). 
For  these  reasons,  it  also  may  not  be  too  big  a  jump  to  accept  the  fact  that  learning 
may  still  take  place  in  the  LE  neurons  of  Aplysta,  even  if  their  responses  arising 
from  stimulation  of  sensory  skin  do  not  occur  until  after  the  motor  neurons  are 
activated  by  other  sensory  neurons. 

Thus,  while  it  is  possible  that  a  unique  “.  icus  of  learning,"  the  engram  in 
Aplysia,  might  still  be  found,  the  data  indicate  strongly  that  the  system  seems  to 
consist  of  many  parallel,  redundant,  and  possibly  interacting  components,  none  of 
which  may  be  the  sine  qua  non  element  in  the  learning  process  or  in  the  generation 
of  the  motor  responses,  irrespective  of  whether  or  not  they  involves  learning. 

6.3.3  THE  NEUROCIRCUIT  MAY  NOT  BE  DEFINABLE  Another  tradition  of  reduc- 
tionism  in  neurobiology,  particularly  in  studies  of  invertebrate  studies,  has  been 
the  notion  that  cells  and  their  function  are  repeatedly  identifiable.  VV'e  have  zJready 
mentioned  some  of  the  problems  in  identifying  function  in  Aplysia.^°  The  re^'ent 
computer  simulations  of  simple  neural  networks  relating  to  the  feeding  system  of 
Aplysia  have  led  to  a  similar  conclusion  that,  “. .  .tests  done  on  individual  neurons 
can  provide  misleading  information  on  the  actual  role  of  the  neuron  in  generating 
behavior.'' Compare  this  quote  with  one  from  Mpitsos  and  Cohan, p.  538: 
"...these  findings  indicate  that  the  classic  technique  f  driving  a  particular  veuron 
in  order  to  assess  its  effect  in  evoking  activity  or  a  behavior  may  be  an  insufficient 
criterion  for  identifying  its  functional  role."  That  is,  a  given  neuron’s  functiem  de¬ 
pends  on  the  context  of  activity  in  which  it  takes  part.  But,  given  variabii’-y  ii.  the 
activity  in  the  firing  patterns  within  such  contexts  or  “mobile  consensuses,’’  even 
this  might  be  an  insufficient  definition. 
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The  neurocircuit  for  a  behavior  is  misrepresented  by  even  the  most  complete 
mappings  of  identified  neurons  that  we  see  in  publications.  Studies  using  voltage- 
sensitive  dyes  show  that  weak,  localized  stimulation  of  sensory  skin  of  the  siphon 
produces  massive  and  variable  activation  of  neurons  in  the  abdominal  ganglion  of 
As  we  have  discussed  of  the  simplified  networks  shown  in  Figure  1 
for  Pleurobronchaea,  the  connectivity  of  the  actual  circuit  of  interacting  neurons  is 
quite  large.  The  larger  the  overall  pool,  and  the  greater  the  number  of  weak  synapses 
that  exist,  the  greater  will  be  the  possibility  that  the  actual  network  generating  a 
behavior  will  be  variable  and  undefinable. 

6.3.4  DIFFERENT  LEVELS  OF  LEARNING  WITHIN  DEFINABLE  SETS  OF  SYNAPSES. 

Let  us  assume  for  the  moment  that  a  small  group  of  neurons  can  be  isolated  func¬ 
tionally  from  the  effects  of  other  groups  of  cells.  Can  we  then  obtain  sufficient 
information  about  the  network  to  define  it  completely  by  looking  at  the  network 
and  knowing  all  of  the  connection  parameters?  We  think  not.  Consider  just  one 
example  relating  only  to  the  strength  f  synapses.  In  our  own  neural  network  simu¬ 
lations,  the  data  indicate  that  synapses  contain  different  forms  of  information. 

One  form  of  information  (“knowledge”)  is  task-specific  relating  to  the  computations 
of  one  or  more  functions  that  network  must  perform.  Another  form  (“metaknowl¬ 
edge”)  has  to  do  with  the  process  by  which  that  task  was  learned;  it  does  not  affect 
the  network  performance  on  the  specific  tasks,  but  only  becomes  evident  when  the 
network  is  confronted  with  new  tasks.  These  conclusions  were  drawn  from  exper¬ 
iments  that  compared  learning  performance  in  networks  that  used  random  noise 
to  optimize  changes  in  synaptic  weights  against  networks  that  were  not  exposed 
to  noise.  Both  types  of  .letworks  were  allowed  to  reach  the  same  level  of  learning 
on  a  given  task,  but  the  noise-exposed  networks  learned  a  subsequent  task  faster, 
even  when  noise  was  not  included  during  training  of  the  second  task,  than  networks 
I  hat  did  not  use  noise.  Starting  networks  at  different  initial  synaptic  strengths  at 
the  beginning  of  a  training  session  yields  different  final  synaptic  settings,  but  all 
final  networks  perform  the  same  learned  task  equally  well.  Because  of  this,  Burton 
and  Mpitsos  initialized  networks  using  different  synaptic  strengths  and  thresholds. 
Examination  of  a  large  number  of  networks  at  the  end  of  the  first  training  session 
revealed  that  the  two  types  of  training  methods  did  not  generate  statistically  sig¬ 
nificant  differences  in  the  mesuis  and  standard  deviations  of  the  synaptic  weight 
settings.  Botn  types  of  networks  contained  the  same  information  for  generating 
equally  accurate  computations  relating  to  the  first  task,  but  networks  that  were 
exposed  to  noise  contained  further  information  that  permitted  them  to  perform 
well  on  a  second  task.  Each  task  has  a  particular  error  landscape  associated  with 
it  (see  Figure  8  in  Burton  and  Mpitsos*^  and  Figure  13  in  Mpitsos^^®  for  examples 
of  error  landscapes  and  volumes).  Burton  and  Mpitsos  suggest  that  noise-exposed 
networks  sample  these  error-structures  more  completely  than  networks  that  were 
not  exposed  to  noise.  Thus,  when  confronted  .with  new  tasks  having  any  similarity 
in  their  error  structures  as  the  first  task,  the  synaptic  settings  of  networks  exposed 
to  noise  already  contain  information  about  the  new  task  and  au'e  able  to  navigate 
its  error  fields  rapidly.  By  contrast,  since  networks  that  are  not  exposed  to  noise 
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contain  less  of  such  information,  they  are  not  able  to  navigate  as  rapidly  through 
the  new  error  structure. 

The  implication  of  these  findings  for  the  present  discussions  is  that  one  may 
look  for  changes  relating  to  a  given  task,  but  depending  on  the  conditions  un¬ 
der  which  that  task  has  been  learned,  the  aggregate  of  synapses  within  a  pool  of 
neurons  may  contain  different  types  of  information,  where  one  type  pertains  specif¬ 
ically  to  one  or  more  tasks  that  have  been  learned,  and  the  second  type  pertains  to 
more  general  conditions  that  do  not  affect  the  accuracy  of  the  first,  but  nonethe¬ 
less  may  camouflage  the  results  that  the  experimenter  is  seeking  to  identify.  The 
rabbit  olfactory  bulb^^^  may  be  a  useful  example  to  contrast  our  findings.  In  this 
structure,  odor-specific  information  is  stored  spatio-temporally,  but  apparently  all 
neurons  take  part  in  expressing  the  code  for  eax:h  odor.  Our  simulation  networks 
can  also  be  constructed  to  encode  information  relating  to  multiple  tasks, but  the 
noise-induced  changes  in  the  network  represent  an  informational  abstraction  that 
goes  beyond  the  information  need  specifically  to  perform  well  on  previously  learned 
tasks.  Therefore,  if  our  computer  simulations  of  connectionist  neural  networks  have 
analogs  in  biological  systems,  the  understanding  of  synaptic  modification  auid  the 
formation  that  the  synapses  contain  cannot  be  deciphered  simply  by  examin¬ 
ing  the  synapses  themselves  as  they  relate  to  only  one  task.  In  their  studies  of 
Mauthner  neurons,  Fabeir  Korn,  and  Lin®^  raise  the  related  caveat,  but  for  dif¬ 
ferent  reasons,  that  “. .  .although  it  is  possible  to  derive  generalized  rules  of  the 
operation  of  synapses,  their  variants  may  exert  a  major  role  in  shaping  the  behav¬ 
ior  of  complex  circuits.” 

Analogous  problems  as  those  described  above  and  in  the  preceding  two  subsec¬ 
tions  may  have  beset  Lashley*°^  whose  unsuccessful  attempts  to  identify  the  locus 
of  stored  memories  (engrams)  in  the  cortex  have  been  more  inspiring  and  illuminat¬ 
ing,  at  least  to  us,  than  were  he  to  have  found  them.  It  is  interesting  that  much  of 
neuroscience  hcis  followed  the  same  course  as  Lashley.  But  now  the  search  has  been 
on  the  cellular  level  in  attempting  to  identify  behavioral  phenomena  in  terms  of 
single  synapses  and  single  neurons.  It  is  also  interesting  that  Pavlov,  before  Lash¬ 
ley,  was  apparently  discontent  with  the  possibility  that  learning  could  be  loceilized 
to  particular  are2is  of  the  cortex  since  learning  persisted  in  his  animals  even  after 
they  had  suffered  brain  damage  (see"Boakes,“°  pp.  127-128). 


6.4  "FUZZY”  CONTROL 

Thus,  the  “control”  we  seek  to  define  for  the  physiological  and  neurohumoral  as¬ 
pects  of  the  nervous  system  is  oblique  and  emergent  rather  than  being  crisply 
Euclidean  in  postulating  particular  causes  and  effects  as  would  be  expected  of  re¬ 
flexes.  One  feature  of  such  emergence  is  that  there  may  be  many  ways  to  do  the 
same  thing,  and  even  gradations  between  these  ways.  We  know,  for  example,  that 
under  some  conditions,  removal  of  a  neuron  from  acting  in  a  motor  pattern  can  be 
compensated  by  shifts  in  the  activity  of  other  neurons.*^®  Redundancy,  arising  from 
information  sharing  among  convergent  pathways,  compensates  for  error  or  failure 
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in  some  of  its  components,  even  if  these  components  originally  generated  strong 
control  over  the  other  members  of  the  coactive  group.  Are  neurohumoral  systems 
equally  redundant,  or  does  each  of  the  ever-growing  number  neurotransmitters  be¬ 
ing  identified  daily  have  a  unique  task?  Our  own  work  leans  heavily  toward  the 
first  of  these  possibilities.*^^  In  the  same  sense  that  there  may  be  "lazy”  synapses 
in  neural  networks,*^®  whose  presence  is  required  only  under  some  conditions,  are 
there  “lazy”  or  even  unnecessary  transmitters?  Some  of  what  we  see  in  a  given 
system  may  represent  baggage  of  evolutionary  or  developmental  processes.  This, 
however,  provides  for  yet  another  form  of  variation  that  permits  possible  adventi¬ 
tious  incorporation  into  further  evolution  or  behavior. 


6.5  IS  OUR  VIEW  HOLISTIC? 

No.  Being  concerned  with  mechanisms  that  generate  global  behavior  is  not  nec¬ 
essarily  being  holistic.  In  our  approach,  global  behavior  depends  on  local  rules 
followed  by  individuals  acting  within  a  large  group.  It  is  these  rules  that  we  seek 
to  identify,  though  there  may  be  different  rules  that  relate  to  global  behavior  di¬ 
rectly.  Even  in  simple  processes  such  as  building  of  sand-grain  mounds*^  and  affine 
transformations,*®  the  global  consequences  of  local  behavior  are  not  predictable. 
Nevertheless,  emergent  function  need  not  be  a  property  of  large  groups  of  neurons. 

It  is  interesting,  however,  that  one  of  the  best  examples  of  work  in  artificial 
intelligence  in  many  decades  employed  a  top-down  analysis  in  which  a  principle 
obtained  from  studies  on  the  behavior  of  whole  animals  was  used  to  gain  insight  into 
how  that  behavior  might  have  emerged  from  individual  neuronal  units.  The  work  we 
refer  to  is  Klopf’s^*^  drive- reinforcement  model  of  associative  learning,  which  extends 
Hebb’s''*  rule  to  account  for  Pavlovian  conditioning.  Hebb's  rule  states  that,  "When 
an  axon  of  cell  A  is  near  enough  to  excite  cell  B  and  repeatedly  and  persistently 
takes  part  in  firing  it,  some  growth  or  metabolic  change  takes  place  in  one  or  both 
cells  such  that  A ’s  efficiency  as  one  of  the  cells  firing  B  is  increased.  ”  Before  Klopf’s 
model,  computer  simulations  of  Hebb’s  rule  in  simple  networks  were  not  successful 
in  demonstrating  learning  that  mimicked  findings  in  biological  systems. 

Hebb’s  rule  may  be  interpreted  gis  a  three-cell  network,*'’^  one  input  cell  for  the 
CS  and  one  input  cell  for  the  UCS,  both  of  which  synapse  on  a  common  follower 
cell  (cell  B).  Klopf^^  made  the  following  crucial  modifications  to  the  rule  to  make 
it  work  in  such  a  simple  system:  (1)  Temporal  delay  was  added  between  the  onset 
of  the  CS  and  UCS.  (2)  Synaptic  modification  was  made  proportional  to  the  rate 
of  change  in  the  CS  and  UCS.  (3)  The  follower  cell  (B)  itself  expressed  a  form  of 
behavior  analogous  to  tendencies  that  may  be  observed  in  whole  animals:  Whole 
animals  seek  to  optimize  some  quality  of  their  environment,  such  as  avoiding  pain 
and  enhancing  pleasure.  Klopf^®  made  the  simple,  but  crucial  analogous  assumption 
that  cells  tend  to  optimize  excitation  and  reduce  inhibition.  Additionally,  to  account 
for  excitation  and  inhibition,  the  follower  cell  received  excitatory  and  inhibitory 
terminals  in  its  CS  input  pathway. 
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The  methodology  for  training  the  network  is  the  same  as  for  training  the  whole 
animal.  In  each  training  trial,  a  pulse  is  presented  to  the  CS  input,  which  initially 
produces  little  effect,  and  after  a  short  delay,  a  pulse  is  presented  to  the  UCS  input. 
The  only  parameter  that  is  arbitrarily  set  in  the  model  is  the  constant  for  the 
rate  of  learning.  Amazingly,  training-induced  changes  in  the  synaptic  effect  of  the 
CS  input  on  the  follower  cell  reproduced  all  of  the  known  Pavlovian  conditioning 
phenomena  in  experimental  animals  and  in  humans  (e.g.,  backward  conditioning, 
CS  alone,  UCS  alone,  trace  conditioning,  second-order  conditioning,  foreshadowing, 
blocking,  conditioned  inhibition,  etc.). 

The  model  has  now  been  extended  to  account  for  instrumental  conditioning. 
The  work  also  made  progress  in  resolving  the  long-standing  debate  relating  to  the 
theoretical  relationship  between  Pavlovian  and  instrumental  conditioning  since  the 
instrumental  conditioning  effects  in  the  model  emerge  from  Pavlovian  condition¬ 
ing.  Thus,  computational  methods  may  have  resolved  what  psychological  debate 
and  experimentation  in  biological  systems  have  not  been  able  to  do.  The  studies 
discussed  in  Section  8  pursue  the  same  rationale  of  using  simple  rules  to  lead  to 
understanding  of  global  effects. 


7.  DOES  A  THEORY  EXIST? 

At  least  three  important  principles  have  emerged  from  dynamical  systems  studies 
that  are  important  to  biologists:  (1)  The  notion  that  distributed  networks  can 
generate  attractors.  (2)  A  considerable  amount  of  information  about  a  system  can 
be  gained  from  bifurcation  analysis.  And  (3)  an  understanding  of  the  dynamics  of 
a  system  can  be  obtained  from  the  phase-space  geometry  of  such  attractors.  By 
these  methods,  it  is  possible  to  discover  much  about  a  system  without  having  to 
resort  to  the  difficult  if  not  impossible  task  of  uncovering  the  sets  of  equations  that 
actually  run  the  system. 

A  long  history  of  work  hcis  developed  these  ideas,  from  Poincare  to  Lorenz, 
Crutchfield,  Farmer,  Packard,  Rosslec,  Ruelle,  Takens,  Swinney,  Shaw,  Yorke,  and 
others  of  the  many  recent  contributors  to  the  knowledge  of  nonlinear  dynamics.^ 
There  are  many  theorems  in  the  field  of  nonlinear  dynamics,  and  there  are  many 
discussions  of  how  to  handle  the  nonlinearities, beautiful  demonstrations 
of  attractor  topologies,  bifurcations,  and  stability  analyses,  when  these  are  in  fact 
available.  As  important  as  these  are,  they  do  not  constitute  a  unified  theory,  at 
least  not  as  it  might  apply  to  brain  function,  though  Bak  and  coworkers  suggest 
that  their  mathematics  or  models  of  self-organizing  criticalities, 
which  apparently  account  well  for  many  physical  and  biological  phenomena,  may 
provide  an  encompassing  dynamical  theory. 

One  way  to  get  around  the  theoretical  problems,  as  is  often  suggested  by  physi¬ 
ologists  and  non-physiologists  alike^  is  to  perform  computer  simulations  on  systems 
whose  state  space  is  completely  defined  and  parameterized,  that  is,  to  determine  all 
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of  the  connections  between  neurons,  membrane  properties,  neurotransmitters,  firing 
thresholds,  and  the  like.  However,  one  look  at  the  complexity  of  the  connections  and 
at  the  wide  divergence  and  convergence  occurring  in  even  “simple”  systems  should 
provide  convincing  evidence  that  this  approach  is  hopeless.^^  Moreover, 

as  discussed  above,  the  reductionist  neurocircuits  that  have  been  developed  over 
the  years  to  account  for  behaviors  are  but  a  caricature  of  the  actual  '  nelwork"  that 
generate  the  behaviors  in  intact  animals. 

The  possibility  might  also  be  suggested  that  insight  into  the  integrative  prin¬ 
ciples  might  be  obtained  from  the  mathematics  describing  the  biological  systems. 
This  also  seems  an  unlikely  possibility  at  present,  even  in  relatively  small  systems. 
Even  in  well-defined  experimental  systems,  the  first  evid^'nce  of  dynamical  states 
and  their  bifurcations  came  from  direct  observations.  One  such  example  is  the 
Belousov-Zhabotinsky  reaction  which  consists  of  about  30  chemical  constituents  in 
which  malonic  acid  is  oxidized  in  an  acidic  bromate  solution. While  it  may 
be  possible  to  define  the  v-nrious  reactant  species  and  list  the  reactions,  it  has  not 
been  possible,  to  our  knowledge,  to  predict  the  dynamics  of  the  system  using  the 
mathematics  of  the  reactions.  Another  example  is  the  demonstration  of  different 
dynamical  states  in  yeast  glycolysis.’"  As  yet  another  example,  near  the  turn  of 
the  century.  Duffing  extensively  studied  damped-driven  oscillators,  yet  the  full  force 
of  the  dynamics  in  his  simple  model  system  was  not  uncovered  until  recently  using 
computer  simulations.”^  ’*''*  Lorenz's  landmark  paper”^  showing  the  first  instance 
of  persistent  chaos  in  a  simple  mathematical  model  of  fluid  convection  was  found 
accidentally  in  computer  simulations,  not  theory. 

Finally,  even  the  application  of  extant  dynamical  systems  tools  to  time  series 
of  experimental  data  provides  little  recourse.’"’^  These  tools  have  largely  been  de¬ 
veloped  using  simple  models  whose  responses  can  be  generated  sufficiently  long  to 
obtain  an  indication  of  their  dynamics.  Biological  responses,  by  contrast,  are  often 
extremely  short  lived.  For  example,  chewing  and  swallowing  behaviors  in  humans 
as  in  Pleurohranchaea  may  be  generated  by  robust  attractors,  but  so  few  cycles 
are  generated  that  characterization  of  their  dynamics,  whether  they  be  limit-cycle 
or  chaotic  attractors,  is  not  possible.  Even  in  ideal  systems,  a  certain  amount  of 
guess-work  needs  to  be  done.  For  example,  the  Grassberger-Procaccia  algorithm 
can  significantly  ov<  '■estimate  the  attractor  dimension  of  limit  cycles  and  under¬ 
estimate  it  for  chaotic  systenjs,  particularly  as  the  dimension  increases,  even  for 
model  systems  such  2is  the  Rossler  hyperchaos. 

The  positive  side  of  all  of  these  problems  is  that  biology  stands  on  an  exciting 
albeit  difficult  threshold  of  growth  in  theories  and  concepts.  And  it  is  biology  that 
will  force  further  development  of  dynamical  tools.  The  work  of  Ellner  and  coworkers 
on  nonparametric  methods  to  calculate  Lyapunov  exponents  is  an  example. 
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8.  COMPUTER  SIMULATIONS:  MINIMAL  MULTIFUNCTIONAL 
NETWORKS 

Computational  analogies  may  provide  insight  where  theory  is  lacking.  Lorenz  s*^^ 
work  on  convection  provides  an  excellent  example  of  how  computer  simulations 
may  spark  insight  into  new  methods  for  handling  complex  systems.  The  work  of 
Klopf  and  coworkers, which  was  discussed  above  under  Reductionism,  is  an¬ 
other  example  in  which  computational  methods  have  proved  decisive  in  addressing 
an  important  problem  in  the  theory  of  learning.  In  Lorenz’s  case,  the  outcome  was 
unexpected.  In  Klopf's  case,  the  outcome  was  planned  because  of  the  equivalence 
of  the  statement  of  drive  reinforcement  at  both  the  unit  and  global  levels.  Both 
of  these  examples  show  that  certain  statements  or  assumptions  about  interacting 
systems  can  be  used  to  address  complex  behavior  through  computational  methods 
without  having  first  to  develop  a  proved  theory  about  the  global  system.  Put  dif¬ 
ferently,  given  certain  assumptions  about  local  events,  it  may  be  possible  to  allow 
the  system  to  generate  itself.  In  the  same  way,  we  discuss  here  four  topics  that 
may  be  addressable  computationally  and  which  may  eventually  prove  beneficial  in 
understanding  some  of  the  complexities  of  biological  organization. 


8.1  NONLINEARITIES  AND  BIFURCATIONS  IN  SIMPLE  NETWORK 
ARCHITECTURES 

As  we  have  referred  to  repeatedly  above,  we  do  not  yet  understand  the  functional 
meaning  of  convergence  and  divergence  beyond  the  notion  of  reflexes, 
or  as  Sperry  put  it,^^®  of  the  "three-bodies  problem.”  In  studies  of  associative 
learning  and  motor  pattern  generator,  there  is  as  much  need  now  for  a  new  language 
to  handle  the  emergent  properties  arising  from  convergence  as  there  was  fifteen 
years  ago.*^’'^  But  we  can  point  at  least  to  two  smeill  interrelated  advancements: 
identification  of  the  nonlinear  interactions  that  arise  from  network  architectures, 
and  the  identification  of  architectures  that  permit  bifurcations  to  arise  from  such 
interactions.  The  discussion  below  uses  several  model  systems  to  clarify  what  we 
mean,  and  to  inquire  into  the  problem  of  continuous  versus  discrete  processes  in 
neuronal  activity. 


8.1.1  NONLINEARITY  AND  BIFURCATION  IN  MODEL  SYSTEMS  Rdssler  and  logistic. 
Nonlinearities  are  easy  to  see  in  simple  models  such  as  the  Rdssler  system^®"  of 
coupled  ordinary  differential  equations  that  generate  complex  chaotic  dynamics: 


dx 

dt 


=  -y-  z 


dy  dz 

di 


where  a,  6,  and  c  are  constants.  Here  X  is  a  function  of  Y  and  Z,Y  is  a  function  of 
X  and  itself,  and  Z  is  a  nonlinear  function  of  itself  and  X.  Each  of  these  variables 
is  expressed  nonlinearly  through  the  others.  The  logistic  equation,  Xn+i  =  /2(1  — 
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is  an  even  simpler  example,  where  the  new  value  on  the  left  is  generated  by 
the  nonlinear  drive  of  the  previous  value  on  the  right  (initialized  between  0  and  1), 
and  is  then  reintroduced  into  the  system  to  generate  the  subsequent  number.  For 
values  of  the  constant  R  between  0  and  about  3.55,  the  process  of  nonlinear  action 
followed  by  recursive  folding  back  into  the  equation  produces  periodic  sequences  of 
numbers,  but  for  R  greater  than  3.55,  the  system  generates  chaotic  sequences.'*^ 
Successive,  linear  adjustments  to  a  constant  such  as  R  may  produce  only  minor 
changes  in  the  system  over  a  large  portion  of  R's  allowable  range.  But  at  critical 
points,  very  small  alterations  in  R  produce  nonlinear  shifts  (bifurcations)  in  the 
sequence  of  numbers.  At  low  /?-scale  resolutions,  regions  are  observed  at  which 
only  chaos  appears  to  occur.  By  expanding  the  R-scale,  one  observes  that  chaotic 
regions  contain  periodic  regimes. 

Bifurcation  in  Hodgkin- Huxley  membrane.  Teresa  Chay’s^^  seminal  paper  e.x- 
amined  a  three-variable  Hodgkin-Huxley  membrane  precisely  in  this  way.  The  time 
variation  of  voltage  in  the  model  is  given  by 

dV"  o  ^  C 

—  Q*i^^hoo{Vi-  V)  +  9k  yn  (Vk-^)  +  9k. c  ^  q{^K-  )  +  9L{^f.-  ^  )  • 

I:  mixed  inward  currents  (sodium,  calcium).  K,V:  voltage-sensitive  potassium  chan¬ 
nel.  C:  internal  calcium  concentration.  K,C:  calcium-sensitive  potassium  current.  L: 
leakage,  n:  probability  of  opening  K,V.  m,h:  probabilities  of  activation,  inhibition, 
g*:  maximal  conductance  divided  by  capacitance. 

The  three  variables  in  the  system  are  (1)  membrane  potential  (T);  (2)  n,  the 
probability  of  opening  the  voltage-dependent  potassium  channel:  and  (3)  intra¬ 
cellular  concentration  of  calcium  (C).  Intracellular  calcium  is  voltage-dependent, 
as  are  sodium,  one  of  the  potassium  channels,  n,  m,  and  h.  It  ran  be  easily  seen 


A  B 


FIGURE  4  Cartoon  of  "minimar  neurocircuit  transpositions  of  the  three-variable 
Rdssler  system  of  coupled  differential  equations  (A)  and  of  the  Chay's  three-variable 
Hodgkin-Huxley  membrane  (B).  See  text. 
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mathematically  that  all  of  these  variables  affect  one  another  through  voltage  (as  a 
consequence  of  their  effects  on  currents),  and  that  the  system  of  such  interactions 
is  highly  nonlinear,  although  examination  of  the  equations  would  not  necessarily 
give  immediate  insight  into  which  parameters  to  use  to  control  bifurcations.  The 
bifurcation  parameter  is  the  calcium-dependent  potassium  conductance  Qk.c,  and, 
as  described  above  for  the  logistic  equation,  the  membrane  produces  many  different 
firing  patterns  when  this  conductance  was  systematically  changed. 

8.1.2  RELATIONSHIP  BETWEEN  BIFURCATION  DYNAMICS  AND  NETWORK  ARCHI¬ 
TECTURES.  To  illustrate  the  difficulties  encountered  in  attempting  to  understajid 
the  dynamical  capabilities  of  network  architectures,  and  the  direction  we  have  taken 
in  some  of  our  computer  studies,  consider  the  (overly)  simplified  cartoons  in  Fig¬ 
ure  4  that  transpose  the  Rossler  system  and  the  Chay  membrane  into  “realistic” 
analogs  of  neuronal  networks.  “Realistic”  might  include  voltage-sensitive  ion  chan¬ 
nels,  calcium-dependent  ones,  transntitter  release  dynamics,  transmitter  re-uptake, 
and  second  messenger  systems,  and  other  processes  one  might  want  to  include  in 
an  experimental  system. 

Given  tonic  excitatory  input  to  A'  in  Figure  1(a),  and  making  A'  capable  of 
post-inhibitory  rebound,  it  may  be  possible  for  A'  and  Y,  and  A"  and  Z,  to  oscillate 
in  opposition  if  there  is  sufficient  accormnodation  in  the  firing  of  Z  and/or  Y.  Figure 
4(b)  shows  a  network  cartoon  of  a  subset  of  the  variables  in  the  Chay  membrane. 
Given  Chay’s  simulations,  it  might  be  predicted  that  the  synapse  of  A'ca  onto  V 
would  provide  access  to  bifurcation  dynamics.  The  nonlinearities  in  the  Rossler  and 
Chay  systems  are  easily  identifiable  in  the  differential  equations  that  compose  them. 
And  it  is  possible  to  see  how  the  calcium-dependent  potassium  conductance  can 
influence  the  dynamics  of  the  Chay  model.  But  it  is  considerably  more  difficult  to 
identify  analogous  nonlinearities  and  bifurcation  conditions  in  neuronal  networks. 
It  has  long  been  established  that  synaptic  activation  of  neurons  leads  to  nonlinear 
responses  because  of  the  firing  threshold  in  the  driven  neuron.  It  is  also  known  how 
to  simulate  individual  synapses  using  digital  integration,  by  describing  the  kinetics 
mathematically,  or  by  examining  nonlinear  interactions  between  different  types  of 
synapses. But  the  dynamical  implications  of  different  network  architectures  and 
of  the  synapse  characteristics  that  affect  the  dynamics  of  regenerative  electrical 
activity  of  neurons  in  these  networks  are  problems  that  remain  largely  untapped. 

Along  this  line,  present  efforts  in  our  laboratory  are  aimed  at  understanding 
what  types  of  converging  and  diverging  centers  in  minimal  networks  are  required 
for  bifurcations  to  occur.  In  the  same  way  as  Chay  used  the  calcium-dependent 
potassium  conductance  to  control  the  bifurcations,  our  efforts  are  to  determine 
whether  synaptic  strengths  can  also  be  used  as  bifurcation  parameters.  The  problem 
facing  us  in  dealing  with  the  biological  system  is  much  more  difficult  than  that  which 
faced  Chay  because:  (1)  our  system  h2is  many  more  degrees  of  freedom.  (2)  Our 
system  is  not  as  smoothly  continuous  as  the  Hodgkin-Huxley  membrane;  i.e.,  the 
membrane  responses  may  seem  continuous,  but  cells  usually  receive  information 
in  short  pulses  or  bursts.  (3)  There  are  no  previous  network  examples  for  us  to 
follow  in  which  bifurcation  have  been  demonstrated.  Interestingly,  the  types  of 
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convergence  centers  that  have  proved  capable  of  bifurcating  into  variable  activity 
in  our  preliminary  computer  simulations,  are  ones  having  similar  structures  as  the 
one  shown  in  Figure  4(b). 

As  our  knowledge  grows  of  the  connectivity  among  the  BC.Ns  and  of  their  con¬ 
nections  with  other  neuronal  groups,  we  shall  construct  computer  simulations  of  net¬ 
works  having  increasing  sizes.  VVe  shall  then  progressively  introduce  the  effects  of  the 
many  ennverging  neurotransmitter  systems.  Additionally,  by  it..:  .^meriting  early 
behavioral  evidence  of  synaptic  competition  during  learning  in  Pie urobranchaea}^^ 
and  the  evidence  for  .synaptic  competition  in  mammalian  cortex.''^  we  expect  to 
see  our  networks  remodel  their  connections  overtime.  Interactive  groups  may  ac¬ 
tually  grow  or  shrink  in  time:  large  populations  may  split  into  subsets:  the  spatial 
boundaries  between  coactive  groups  may  move  in  time:  and  network  architectures 
may  emerge  that  affect  the  amount  of  variation  occurring  in  the  network. 

8.1.3  CONTINUOUS  VS.  DISCRETE  PROCESSES.  T  he  Rdssler  and  Chay  model  are 
both  three-variable  systeius,  as  required  of  any  continuous  bounded  system  that  is 
capable  of  generating  chaos.  VVe  summarized  the  reasons  behind  the  need  for  three 
variables  using  mi.xing  of  trajectories  in  three-space  and  an  examination  of  Lya¬ 
punov  exponents  in  subsection  3.5.2.  By  contrast,  discrete  processes  can  generate 
chaos  in  one  dimension,  as-in  the  case  of  the  logistic  equation,  and  coupled  discrete 
processes  can  generate  chaos  in  two-space,  as  shown  by  the  Henon  system,  where 
A'„+i  =  IflA',!;  -I-  V'n  and  Vn+i  =  bXn-'^  Recall  also  that  the  issue  is  not  whether 
a  system  generates  chaos,  but  its  ability  to  exhibit  both  simple  and  complex  be¬ 
haviors.  depending  on  its  bifurcations  conditions  arising  from  simple  quantitative 
alterations  rather  than  from  qualitative  changes  in  network  structure.  .Moreover,  if 
the  bifurcation  parameter  is  the  driving  frequency  of  an  input  signal,  it  is  not  nec¬ 
essary  even  for  quantitative  changes  to  occur  in  the  network  for  simple  and  complex 
dynamics  to  appear. 

The  difference  between  continuous  and  discrete  processes  is  of  significance  to 
neurobiologists.  The  neural  networks  studies  of  Mpitsos  and  Burton^^®  indicate 
that  when  signals  between  networks  are  chaotic  discrete  processes,  simple  networks 
are  able  to  perform  difficult  tasks  on  these  signals  that  would  otherwise  require 
more  complex  networks  to  perform’' if  the  mode  of  transmission  used  continuous 
periodic  or  continuous  chaotic  processes.  Continuous  processes  are  used  in  neural 
integration. but  the  usual  mode  of  information  transfer  is  through  trains  of  action 
potentials.  Trains  of  action  potentials  in  pacemaker  firing  cells  are  generated  by 
continuous  fluctuations  in  membrane  potentials  and  in  the  dynamics  of  ionic  species. 
Examples  may  be  found  in  computer  simulations  of  the  parabolic  burster  neuron 
R\5  in  Aplysta.-^  and  in  the  Chay  model  described  above.  The  information  in  these 
spike  trains,  though  generated  by  continuous  processes,  is  in  a  pulse  code.  Therefore, 
there  are  a  number  of  questions  that  need  examination.  For  example,  is  there  an 
informational  difference  between  the  dynamics  of  spike  trains  by  comparison  to  the 
information  contained  in  the  continuous  membrane  processes  that  generate  them? 
What  happens  in  postsynaptic  cells  when  they  receive  such  spike  trains,  and  when 
are  we  to  consider  the  dynamics  in  the  postsynaptic  cells  as  continuous  processes 
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or  analogs  of  discrete  processes?  The  membrane  potentials  of  these  follower  cells 
may  appear  continuous,  but  they  are  driven  by  discontinuous  input  events. 

The  differences  between  discrete  and  continuous  processes  pose  problems  in 
numerical  analyses.  Experimental  data  usually  consists  of  the  time  series  of  one 
or  several  dependent  variables,  but  the  methods  provide  little  knowledge  of  the 
number  of  dependent  variables  that  actually  drive  the  system.  Numerical  methods 
provide  some  help.  For  example,  it  is  possible  to  conduct  phase-space  analyses  that 
give  information  about  the  topological  dimension  of  attractors  and  about  the  num¬ 
ber  of  dependent  variables  (embedding  space)  that  may  be  involved  in  generating 
the  attractors. The  evidence  provides  some  justification  supporting  chaotic 
attractors  and  low- dimensional  embedding  space. 

However,  some  of  the  calculated  attractor  dimensions  were  lower  than  two, 
posing  some  difficulties  in  interpretation  of  what  the  dynamics  is.  Continuous  sys¬ 
tems  must  have  at  least  three  Lyapunov  exponents;  there  must  be  at  least  two 
non-negative  ones,  one  being  positive,  pis  required  for  chaos,  and  one  having  zero 
value,  as  required  by  Haken’s  theorem  (subsection  3.3.2).  Given  two  non-negative 
exponents,  calculations  using  the  Kaplan- Yorke  conjecture  should  be  expected  that 
the  lowest  attractor  dimension  for  continuous  chaotic  systems  be  greater  than  two 
(examples  are  given  in  Andrade  et  al.®;  Wolf^®^).  One- variable  discrete  processes, 
such  as  the  logistic  eqiwttion,  have  dimensions  less  than  1.  Two- variable  discrete 
processes  have  dimensions  between  one  and  two;  our  own  estimate  of  the  Henn  sys¬ 
tem  gives  dimension  of  about  1.36.  Knowing  the  mathematical  representation  of  a 
system  allows  one  to  place  such  numbers  in  appropriate  context,  but  experimental 
ta  leaves  numerical  results  ambiguous.  Do  we  assume  that  attractor  dimensions 
less  then  two  are  coupled  discrete  processes  or  is  it  a  problem  with  the  analytical 
methods?  Of  the  latter  possibility,  the  available  tools,  whether  using  time  series  of 
a  single  variable  or  all  variables,  calculation  of  attractor  dimensions  are  difficult  to 
obtain  even  for  model  systems.® 

Answers  to  questions  as  the  one  given  above  are  necessary  because  they  provide 
n  indication  about  how  information  is  processed  and  encoded.  We  are  presently 
addressing  them  using  numerical  analyses  of  data  from  computer  simulations  of 
membrane  patches  and  of  responses  of  cells  in  networks  where  we  have  access  to  all 
parameters  and  variables  of  the  sysCfem.  Comparison  of  analyses  on  the  data  from 
measurements  of  continuous  variables  and  from  spike  trains  may  yield  some  insight 
into  implications  relating  to  continuous  and  discrete  processes. 


8.2  RESPONSE  OPTIMIZATION,  ENERGY  GRADIENTS,  AND  ATTRACTORS 
IN  BIOLOGICAL  NETWORKS 

8.2.1  ATTRACTORS.  FROM  SEA  SLUGS  TO  BEES  ReaF®^  has  shown  recently  that 
bees  are  able  to  adjust  their  behavior  so  as  to  optimize  the  use  of  food  resources. 
Whether  or  not  this  involves  gradients  and  attractors  has  not  been  addressed.  The 
idea  is  consistent  with  the  possibility  that  biological  networks  (and  biological  sys¬ 
tems  generally)  may  exhibit  behavior  that  tends  to  minimize  some  gradient  factor 
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(as  error  or  energy)  through  the  ability  of  attractors  to  dissipate  energy. At¬ 
tractors  (see  subsection  3.4)  pull  in  any  phase-space  trajectory  that  falls  within  their 
basin  of  attraction.  Thus,  for  example,  in  limit  cycles,  externally  applied  perturba¬ 
tions  move  the  trajectory  of  the  system  away  from  the  limit  set.  but  if  the  state  of 
the  trajectory  remains  within  the  attractor  s  basin  of  insets,  the  trajectory  will  fall 
asymptotically  back  into  the  limit  set.  Chaotic  attractors  also  attract  nearby  states 
but  dissipate  perturbations  over  their  entire  surface.  We  might  say  that  attractors 
minimize  energy  or  error. Put  differently,  attractors  optimize  the  match  between 
their  attracting  set  and  activity  that  falls  near  it.  In  either  case,  the  action  may  be 
consider  a  minimization  process.  On  the  behavioral  level,  bees  are  able  to  control 
their  foraging  techniques  so  as  to  optimize  the  use  of  food  resources. 

8.2.2  COMPARISON  THROUGH  ANALOGY  IN  PRINCIPLES,  NOT  IN  IDENTITY  OF 
MECHANISMS  The  potential  consequences  of  the  identity  between  attractors  and 
optimization  are  rather  interesting.  Consider  the  following  situations.  In  attempting 
to  simplify  computer  simulations,  it  is  often  difficult  to  determine  e.xactly  where  to 
limit  the  characterization  of  the  biology.  For  example,  the  connectionist  methods  of 
error  back-propagation  are  usually  faulted  because  of  their  obvious  non-biological 
nature.  But  the  answers  that  come  from  the  use  of  such  networks  depends  on  the 
principles  that  are  actually  being  simulated.  The  major  driving  element  of  error 
back-propagation  is  that  the  system  must  follow  a  negative  error  gradient  between 
a  teacher  function  and  the  output  of  the  system. If  the  question  being  addressed 
has  to  do  with  the  principle  of  error  reduction,  rather  than.  say.  what  second  mes¬ 
sengers  might  be  involved  in  a  cellular  process,  or  how  feedback  actually  occurs  in 
a  real  nervous  system,  the  back- propagation  method  might  give  some  insight  into 
how  gradient-seeking  systems  store  information  in  their  distributed  elements. 

Response  thresholds.  Following  this  rationale,  Mpitsos  and  Burton^^®  obtained 
a  number  of  results  that  might  have  relevance  to  biological  systems.  They  found,  for 
example,  that  the  computational  capabilities  of  networks  are  severely  limited  when 
only  trainable  synaptic  strengths  are  used.  Adding  trainable  thresholds  significantly 
expands  the  computational  power  of  the  networks.  In  invertebrate  learning  studies, 
thresholds  (as  might  be  inferred  from  membrane  changes  in  postsynaptic  cells)  have 
either  not  been  observed  at  the  cellular  level  or  have  not  been  generally  attended 
to.*'*®  Studies  on  long-term  potentiation  (LTP)  in  rats  have,  however,  provided 
evidence  implicating  response  thresholds  through  changes  in  synaptically  induced 
changes  in  the  ratio  of  excitation  and  inhibition  rather  than  changes  in  membrane 
impedance. Heretofore,  the  methods  used  to  test  LTP  have  not  focused  on 
assessing  the  computational  implications  of  threshold  adjustments,  nor  the  technical 
conditions  to  extend  the  findings,  but  it  would  be  extremely  interesting  to  determine 
whether  adjustments  in  the  ratio  of  excitation  to  inhibition  were  set  differently 
for  each  cell,  as  might  occur  in  gradient  descent  adjustments  in  thresholds  during 
learning  in  neural  networks. 

Network  size  may  be  self-limiting.  An  unexpected  finding  in  the  studies  of  Mpit¬ 
sos  and  Burton*^®  was  that  increasing  the  number  of  neurons  in  a  hidden  layer  or 
interneuronal  layer  beyond  a  certain  point  slows  and  eventually  causes  the  system 
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to  cease  learning;  i.e.,  group  size  may  be  self-limiting.  Limitation  of  group  size  heis 
been  enforced  algorithmically  in  simulations  of  mammalian  cortex  through  synap¬ 
tic  competition  and  inhibitory  synapses.^"' It  is  also  conceivable,  however,  that 
group  size  may  be  additionally  limited  by  the  gradient  tendencies  of  attractors.  If 
the  findings  of  Mpitsos  and  Burton  hold  biologically,  the  slower  organizational  times 
of  large  networks  may  be  superseded  by  smaller  subsets  of  neurons  as  they  form 
attractors.  Once  sufficiently  formed,  the  attractors  themselves  may  restrict  group 
size,  partly  by  their  gradient  processes,  and  partly  by  learning-related  synaptic 
competition.  To  our  knowledge,  the  network-forming  aspects  of  synaptic  competi¬ 
tion  have  been  viewed  only  at  the  level  of  neuronal  trophic  factors  and  whether  or 
not  activity  occurs.  What  we  are  attempting  to  point  out  here  is  that  the  network 
not  only  generates  activity,  but  that  the  dynamics  of  this  activity  may  affect  the 
characteristics  of  the  network  architecture. 

A  similar  distinction  between  activity  and  dynamics  may  be  raised  in  studies 
of  motor  pattern  switching.  In  a  traditional  sense,  switching  between  patterns  of 
activity  require  some  network  change  or  the  introduction  of  activity  in  a  controlling 
neuron.^®  We  do  not  deny  this  possibility,  but  add  that  the  notion  of  bifurcation 
raises  the  discussion  from  the  level  of  activity  alone  to  a  level  involving  dynamical 
processes.  Using  John’s  terminology,^^  the  former  is  a  '‘switchboard”  effect  relating 
to  particular  neuron(s), -whereas  the  latter  is  an  abstraction  of  the  self-organizing 
activity  in  neurons,  and  quite  likely  may  not  be  identifiable  in  network  structure, 
although  some  identifiable  structural  indices  may  be  obtainable  as  discussed  for  the 
studies  of  Figure  4. 

Metaknowledge  and  lazy  synapses.  Metaknowledge  represents  that  ability  of 
networks  to  store  different  forms  of  information.^^  We  discussed  it  above  in  dealing 
with  reductionism  (subsection  6.3.4),  and  we  believe  that  it  may  be  a  consequence  of 
gradient  tendencies.  Our  computational  studies  also  found  that  although  networks 
set  their  synaptic  weights  and  thresholds  at  optimum  levels,  many  of  the  synaptic 
weights  produce  little  effect  when  removed  from  the  network;  i.e.,  they  are  “lazy.” 
Mpitsos  and  Burton^^®  discuss  a  number  of  uses  for  such  synapses.  One  of  the  most 
interesting  possibilities  comes  from  somewhat  different  studies  by  Warren who 
showed  that  certain  synapses  may  be  deleted  after  training  without  significantly 
affecting  network  performance  on''a  previously  learned  task,  but  networks  were 
unable  to  learn  the  task  if  they  started  with  the  reduced  number  of  synapses  in  the 
first  place.  This  poses  interesting  problems  to  biologists  since  weak  connections  are 
often  observed  between  the  interactive  components  of  their  experimental  systems. 
The  tendency  in  the  past  has  been  to  dismiss  such  connections,  or  to  presume 
that  they  would  be  “pruned”  away  if  not  used.  Our  findings  along  with  Warren’s 
indicate  that  these  synapses  may  be  crucial  for  learning  new  tasks.  By  analogy  to 
computers,  they  might  be  considered  as  temporary  registers  that  permit  gradient 
descent,  but  once  gradient  descent  has  been  reached,  they  are  no  longer  needed  for 
that  task. 
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8.3  LOCAL  ERROR  MINIMA  IN  BIOLOGICAL  ADAPTATION 

The  idea  that  a  system  tends  to  optimize  its  behavior  has  a  somewhat  different  ex¬ 
pression  in  biological  systems  than  it  might  have  in  computer  simulations  of  connec- 
tionist  neural  networks.  With  enough  time  and  stable  environmental  conditions,  we 
can  envision  that  evolutionary  competition  between  organisms  will  produce  changes 
that  best  adapt  the  species  to  the  environment.  One  might  think  of  the  process  as 
reaching  an  absolute  error  minimum  between  the  response  of  the  organism  and 
the  best  possible  response  under  the  imposed  conditions.  Any  response  that  is  not 
optimal  represents  a  local  minimum.  In  neural  networks,  methods  have  been  de¬ 
veloped  (see  subsection  3.5.3)  to  avoid  local  minima  using,  for  example,  simulated 
annealing^^  and  time-invariant  noise  algorithms  (TINA).'^  Simulated  annealing 
usually  involves  exponential  decay  of  noise  over  time.  TINA  adjusts  noise  as  a 
function  of  the  amount  of  error  that  is  produced  when  a  system  responds  to  its 
input  stimuli.  This  method,  however,  was  chosen  only  as  a  vehicle  to  demonstrate 
the  idea  of  TINA.  Other  methods,  not  necessarily  directly  related  to  error  feedback, 
may  also  be  used  that  retain  time  invariance.  For  example,  our  present  attempts 
to  implement  TINA  in  networks  consisting  of  neurons  having  biologically  realis¬ 
tic  characteristics  is  to  adjust  the  probabilistic  release  of  transmitter^®  or  to  use 
short-term  activity-dependent  learning  rules  such  as  sensitization^"*®  to  maintain 
the  flow  in  a  given  part  of  the  network.  Our  goal  is  to  assign  certain  facilitatory 
responses  to  classes  of  neurone,  and  then  to  allow  the  actual  pathway  to  emerge 
dynamically.  Low-error  would  be  represented  by  activity  recurring  through  a  partic¬ 
ular  part  of  the  network.  As  error  increases,  diffusely  distributed  feedback  onto  the 
network  would  disrupt  such  preferentially  frequented  pathways,  permitting  others 
to  emerge.  If  these  new  pathways  lead  to  low  error,  feedback  decreases,  allowing  the 
flow  through  the  pathway  to  continue.  If  attractors  self-organize.  the  preferential 
pathways  would  then  be  further  entrenched,  because,  as  discussed  above,  the  basin 
of  insets  to  the  attractor  itself  may  represent  an  energy  or  error-minim  ;  j^rocess. 

This  process  does  not  require  that  the  tendency  to  follow  a  gradient  actually 
reach  an  optimal  minimum,  or,  equivalently,  that  the  attractor  be  spatio-temporally 
a  robust,  stable  structure.  Biologically,  in  both  the  daily  behavior  of  organisms  and 
in  their  evolutionary  succession,  local jninima  are  extremely  important  in  generating 
adaptive  responses.  Whatever  works  is  sufficient,  whether  the  response  is  optimal 
or  not.  Thus,  our  notion  of  an  adaptive  system  is  one  that  can  generate  different 
minima  that  can  be  addressed  rapidly,  and  exited  rapidly  if  they  do  not  meet  the 
need.  Indeed,  we  believe  that  it  is  from  the  ability  to  generate  many  local  minima 
that  multibehavioral  networks  may  have  evolved. 

Part  of  the  understanding  about  the  generation  of  local  minima  will  be  to  see 
how  multibehavioral  networks  generate  different  attractors  in  computer  simulations. 
Transitions  between  different  attractors  may  yield  labile  intermediate  forms  that 
only  partially  resemble  more  stable  ones.  The  most  difficult  problem  that  we  face 
here  is  to  determine  how  best  to  visualize  temporal  activity  graphically  for  spike 
trains. Continuous  non-spiking  processes  pose  less  of  a  problem.®  Part  of  the 
answer  may  also  come  from  an  understanding  of  spatio-temporal  dynamics. 
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8.4  VISUALIZATION  OF  SPATIO-TEMPORAL  DYNAMICS 

The  more  we  study  biology,  the  more  it  seems  that  we  must  somehow  leave  it  to 
gain  a  feel  for  what  may  be  happening  there.  Put  simply,  biological  systems  are 
too  complex  and  uncontrollable  even  to  perform  experiments  as  those  represented 
by  Figure  4.  We  must  imbue  these  simulation  networks  with  as  much  biological 
information  as  needed  to  obtain  activity  that  somehow  resembles  the  activity  of 
the  biological  system.  But  complete  state-space  parameterization  of  the  biological 
system  is  beyond  hope,  as  one  glimpse  of  the  complexity  in  Figures  2  and  3  will 
show.  At  the  level  at  which  we  can  attribute  realistic  biological  characteristics  to  a 
network,  the  system  becomes  intractable  even  for  .simple  analyses  of  steady  states 
(see  example  analysis  of  a  simple  model  system  in  Andrade  et  al.*^). 

Given  the  growing  power  of  computer  graphics  and  the  increasingly  easier  ac¬ 
cess  to  supercomputers,  the  recourse  for  biologists  interested  in  the  emergence  of 
group  dynamics  is  to  conduct  the  type  of  experiments  shown  in  Figure  4,  and.  es¬ 
pecially,  to  visualize  the  spatio-temporal  flow  of  activity  in  large-scale  simulations 
involving  many  interacting  units.  An  understanding  of  such  spatio-temporal  flows 
is,  we  believe,  one  of  the  central  questions  facing  neuroscience.  Walter  Freeman  and 
coworkers  were  perhaps  the  first  to  begin  a  detailed  account  of  spatially  distributed 
recordings  in  their  studies  of  rabbit  olfactory  bulb  (e.g.,  see  review  in  Skarda  and 
Freenian^"^).  But  even  in  these  studies,  the  analysis  of  the  temporal  flow  is  of  the 
time  series  of  single  recording  sites.  Perhaps  the  major  lesson  in  dynamical  systems 
work  over  the  decade  has  been  the  fact  that  much  can  be  learned  about  the  activ¬ 
ity  of  a  system  by  the  analysis  of  its  phase-space  geometry.  Up  to  four  variables 
can  be  analyzed  simultaneously  using  time  series  analysis  (e.g.,  see  Figures  8-11  in 
Andrade  et  al.,®  and  Figure  13  in  Mpitsos  and  Burton We  need  to  do  the  same 
for  many  variables,  both  spatially  and  temporally. 

By  such  methods  it  may  be  possible  to  examine  the  possibility  of  limit  cycles, 
chaotic  attractors,  SOCs  and  turbulence,  the  coexistence  of  multiple  attractors, 
movement  of  these  attractors  spatially,  and  possibly  even  their  blending  into  one 
another.  It  may  also  be  possible  to  determine  how  particular  circuit  structures 
emerge,  how  variability  appears  controlled  by  particular  circuit  characteristics.  In 
the  long  term  it  will  be  important  taask  how  such  structures  are  affected  by  system- 
wide  factors.  If  we  are  to  believe  our  neurochemical  findings,  it  is  quite  likely  that 
bifurcation  parameters  may  be  more  accurately  defined  as  being  distributed  over 
a  large  number  of  cells  rather  than,  for  example,  in  the  conductance  modification 
of  a  single  cell.  The  first  possibility  may  explain  the  fact  that  some  systems  are 
relatively  insensitive  to  changes  in  only  a  few  of  their  components. 


9  CONCLUSION 

In  answer  to  the  title  of  this  paper,  we  have  ai‘ually  said  little  about  what  sea 
slugs  can  tell  us  explicitly  about  the  neurointegration  of  specific  human  movement. 
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But  we  believe  that  the  findings  tell  us  considerably  about  what  must  be  addressed 
in  order  to  gain  a  unified  perspective  of  biological  integration  that  might  eventually 
affect  how  we  view  human  movement.  We  understand  that  much  has  been  said  ap¬ 
propriately  by  others  about  coordination  of  limbs  in  invertebrates  and  vertebrates, 
the  rightful  importance  of  FAPs.  and  selective  contiol  of  individual  neurotransmit¬ 
ters  on  pattern  generation  and  in  the  formation  of  network  structure,  and  that  such 
findings  may  be  applicable  to  human  motor  behavior.  Perhaps  most  of  the  time  all 
of  these  studies  provide  the  best  answers,  as  most  of  the  time  Newtonian  physics 
provides  the  right  answers  in  daily  engineering  problems.  Perhaps  also,  the  neuroin- 
tegrative  processes  in  Pleurobranchaea  and  Aplysia  follow  the  same  predictabilities 
most  of  the  time. 

The  instances  that  are  not  explainable  by  traditional  neurocircuit  perspectives 
might  be  dismissed  as  biological  aberrance.  .\lt  ^rnatively,  owing  to  the  fact  that  the 
animal  .seems  to  function  w(dl  enough  with  them,  they  may  be  pursued  as  being  of 
adaptive  significance.  We  have  followed  the  latter  route,  and  have  been  forced  into 
a  perspective  that  is  more  statistical  mechanical  and  dynamical  than  classically 
"switchboard.”  Lorenz voiced  the  long-held  view  that  all  biological  information 
is  stored  in  structure.  We  hardly  disagree  with  that.  But  the  question  is.  how 
do  we  read  that  information,  and  is  much  of  it  redundant  and  even  of  nonsense 
or  accidental  value?  The  krtter  possibilities  may  actually  provide  certain  adaptive 
value  adventitiously  in  ever  changing  and  unpredictable  environments.  In  reaching 
a  new  theoretical  perspective  that  addresses  these  issues,  our  view  is  that  there  are 
two  leyels  of  solution:  the  special  case,  relating  to  the  switchboard  neurocircuit, 
and  the  general  solution,  that  must  be  reducible  to  the  special  case  but  must  also 
provide  a  general  theoretical  foundation  that  is  extensible  to  many  other  cases. 

The  shift  to  dynamics,  or  at  least  away  from  answering  all  questions  by  using 
reflexes,  marks  a  shift  away  from  mechanism  to  organization.  Although  each  bi¬ 
ological  level  of  organization  may  express  the  dynamics  in  its  own  processes,  the 
dynamical  principles  may  be  applicable  to  all  levels  of  organization.  The  central 
question  in  all  of  these  systems  is  "'How  does  the  individual  influence  the  group, 
and,  in  turn,  how  does  the  group  influence  the  actions  of  the  individualT'  We  have 
tried  as  much  as  possible  to  couch  our  ideas  on  biological  findings,  though  much 
more  data  needs  to  be  gathered  (and”re-gathered)  before  we  feel  more  comfortable. 
If  what  we  have  discussed  is  accurate,  then,  as  Barbara  McClintock  envisioned.  "We 
are  going  to  have  a  new  realization  of  the  relationship  of  things  to  each  other."'^ 
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Mapping  from  Speech  Acoustics  to  Tongue 
Dorsum  Movement:  An  Application  of  a 
Multilayer  Perceptron 


INTRODUCTION 


In  this  lecture  we  will  illustrate  the  application  of  nonlinear  neural  network  tech¬ 
niques  to  a  large-scale  problem  of  some  importance.  While  the  strategies  and  meth¬ 
ods  we  adopted  may  not  be  exactly  the  correct  approach  for  other  situations,  we 
suggest  that  a  detailed  examination  of  how  one  complex,  nonlinear  problem  was 
solved  will  facilitate  the  search  for  good  solutions  to  other  similar  problems.  The 
focus  will  be  on  how  we  managed  to  reach  a  satisfactory  result  with  the  technique, 
rather  than  on  the  theoretical  benefits  of  the  method. 

The  problem  explored  in  this  paper  is  that  of  finding  a  map  from  speech  acous¬ 
tics  to  the  movement  of  the  speech  articulators.  If  we  were  successful  in  this  task, 
we  could  then  produce  a  simulated  X-ray  of  the  movements  of  the  tongue  during 
speech.  This  would  be  a  substantial  aid  to  the  deaf  since  such  a  display  would  per¬ 
mit  a  kind  of  “lip  reading”  of  the  tongue.  It  would  also  benefit  the  field  of  speech 
therapy,  where  knowledge  of  tongue  movements  would  help  to  monitor  and  guide 
treatment.  As  an  instantiation  of  the  problem,  we  have  chosen  to  map  to  the  vertical 
movement  of  the  tongue  dorsum,  which  we  define  as  a  point  on  the  midline  upper 
surface  of  the  tongue  30  mm  behind  the  tip — a  point  which  generally  touches  the 
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roof  of  the  mouth  near  the  posterior  boundary  of  the  hard  palate  when  a  speaker 
forms  consonants  like  /k/  and  /g/. 

We  also  view  our  research  as  a  first  step  toward  a  device  that  will  recognize 
natural,  continuous  speech  produced  at  different  rates  by  different  speakers.  Our 
approach  is  based  on  a  substantial  body  of  theoretical^'®  and  experimental^’^® 
evidence  that  suggests  people  perceive  speech  by  extracting  from  the  acoustic  signal 
information  regarding  the  vocal  tract  gestures  that  were  used  to  produce  the  speech. 
According  to  this  approach  all  sounds  in  a  particular  language  can  be  represented 
by  a  unique  pattern  of  gestural  units.  For  example,  the  difference  between  /k/  and 
/g/  is  that  the  velar  closing  gesture  is  accompanied  by  a  glottal  adduction  in  /g/  but 
not  in  /k/.  If  we  could  detect  the  motion  of  the  tongue  dorsum  during  speech,  we 
would  be  able  to  identify  the  occurrence  of  a  velar  closure  gesture,  and  would  have 
developed  a  system  that  might  be  extended  to  other  gestures.  This  in  turn  would 
eventually  enable  us  to  detect  the  unique  combination  of  gestures  that  specify  each 
utterance  in  the  language.  One  advantage  of  gestural  units  for  speech  recognition 
is  that  gestures  are  relatively  invariant  across  speaking  rate.“^  For  a  more  detailed 
discussion  of  the  mapping  problem,  as  well  as  its  implications  for  recognizing  speech 
gestures,  see  Papcun  et  al.‘® 

We  chose  to  attack  the  problem  with  neural  networks  because  we  expected  the 
relationship  between  acoustics  and  tongue  positions  to  be  extremely  complex,  and 
because  previous  attempts  to  map  from  speech  acoustics  to  vocal  tract  position 
by  analytic^  or  other  techniques*'*®  have  not  been  particularly  successful.  .\lso, 
multilayer  perceptrons  (MLP)  have  been  used  successfully  to  map  from  speech 
acoustics  to  phoneme.s'*'*'’  and  to  words.^  Therefore,  we  elected  to  use  a  feed¬ 
forward,  fully  interconnected  MLP  to  attempt  to  map  from  speech  acoustics  to  the 
underlying  motion  of  the  articulators  that  produced  that  speech. 


INPUT  AND  OUTPUT 

Data  were  collected  at  the  University  of  Wisconsin’s  Waisman  Center  s  X-ray  mi¬ 
crobeam  facility.  The  microbeam  tracked  the  movement  of  a  2.5  mm  gold  pellet 
attached  to  the  tongue  dorsum  during  natural  speech.  The  acoustic  signal  was  also 
simultaneously  recorded.  We  recorded  three  male  speakers  repeating  monosyllabic 
words  three  times  each  in  lists  of  eight  words.  Each  list  formed  a  record  and  took 
20  sec  to  complete.  The  words  contained  one  of  five  vowels;  / 1//  as  in  hud,  /ae/  as 
in  had.  /I/  as  in  hid,  I'lf  as  in  he,  or  /o/  as  in  hoe.  These  were  preceded  either  by  a 
glottal  fricative  /h/,  an  unvoiced  velar  stop  /k/,  or  an  unvoiced  alveolar  stop  /t/. 
After  the  vowel,  the  words  were  ended  either  by  an  unvoiced  velar  /k/,  a  /ks/.  or  by 
a  voiced  alveolar  /d/.  Words  beginning  with  /h/  had  no  final  consonant.  Examples 
of  the  words  are  he,  cud,  toke,  tucks,  and  tax.  These  words  were  selected  because 
they  featured  velar  stops  in  a  variety  of  vowel  contexts. 

As  in  any  use  of  neural  network  techniques,  a  crucial  decision  was  how  to 
represent  the  input.  A  good  representation  contains  the  necessary  information  for  a 
successful  map  and  eliminates  unnecessary,  confusing,  or  irrelevant  information.  The 
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speech  signal  contains  large  amounts  of  this  latter  type  of  information,  such  .as  ihat 
related  to  the  loudness  of  the  speech  signal,  and  the  sex  and  identity  of  the  speaker. 
However,  .as  usual  in  the  application  of  inniral  networks.  w<‘  do  not  know  .  xacily 
what  aspect  of  the  input  will  contribute  to  a  succi'ssful  map—  if  we  did  have  tins 
exact  knowledge,  we  would  probably  not  have  selected  a  neural  lunwork  appro.acii. 
but  rather  wniiUl  have  used  a  more  struct u rial  artificial  intelligence  technitjne. 

riie  rt'presentation  problem  can  be  summarized  ,as  a  choice  between  two  lUtfer- 
('iit  philosophical  approaches:  ( 1)  tht'  know-nothing  approach  and  (2)  the  kiiosv-it-.ail 
approach.  If  we  had  followed  the  first,  we  would  have  presented  the  speech  signal 
in  something  close  to  its  raw  form  and  claimed  that  the  network  would  adjust  its 
weights  to  ignore  irrelevant  information:  if  we  had  adopted  the  .second  approach,  we 
would  have  thrown  out  everything  judged  to  be  irrelevant  and  let  the  network  ex- 
arniru'  only  ,i  highly  Hltereil  signal  that  highlighted  the  supposedly  critical  featurt's. 
The  .idvantage  of  th<‘  first  approach  is  that  we  .are  fairly  certain  to  havi'  included  t  he 
necessary  information,  while  the  disadvantagi'  is  that  all  the  excess  input  will  slow 
the  finding  of  a  solution  or  will  permit  a  solution  that  is  idiosyncratic  to  the  training 
set.  This  will  cause  a  failure  to  generalize  successfully.  The  advantages  of  the  second 
approach  are  that  it  is  likely  the  features  we  have  highlighted  will  actually  hi'  used 
by  the  net,  thus  aiding  generalization.  .Moreover,  we  could  expect  the  n.'duced  size 
and  complexity  of  the  netrto  result  in  faster  training.  The  disadvantages  are  that 
faulty  filtering  of  the  speech  wdll  exclude  key  information  and  the  net  will  fail  to 
learn  or  will  learn  a  solution  that  will  fail  to  generalize.  Our  strategy  was  to  lean 
to  the  know-nothing  approach,  but  to  filter  the  speech  in  ways  that  are  well  known 
to  be  done  by  the  peripheral  auditory  systiun  i.c'.,  transform  short  timi'  st'gments 
to  tlie  frequency  domain,  and  represent  the  spectral  eni'rgy  in  log  scales  in  txuh 
power  (db  scale)  and  frequency  (bark  .scale). 

The  acoustic  signal  was  sampled  at  10000  Hz.  low  p.iss  lilti'rixl  at  .a0iHi  Hz. 
and  adjusted  to  a  mean  of  0  to  remove  any  dc  bias.  It  was  then  .segmented  into 
overlapping  12.8  msec  sections,  passed  through  a  Welch  window.'''  and  I ransforiiKal 
to  frequency  domain  by  an  TFT.  rhe  resulting  power  spectrtmi  was  convertixi 
to  a  db  scale  and  then  redistributed  into  19  bark-scale  categories  to  more  nearly 
represent  the  frequency  resolution  of  the  human  auditory  system.  The  first  two  bark 
bins  (0  -200  Hz)  were  deleted  because  they  primarily  contain  information  about  tlu' 
pitch  of  the  speaker's  voice.  This  helped  to  reduce  the  dimension  of  the  input 
vectors,  thus  simplifying  the  network  by  reducing  the  number  of  weights.  .After  this 
step,  each  record  was  represented  by  sequential  (5.4  msec  frames,  each  containing 
17  numbers  representing  the  power  spectrum. 

rite  next  step  was  to  normalize  both  the  articulatory  data  and  the  acoustic 
data  processed  as  described  abov\  .  .Normalization  can  enhance  the  I'fft'ci iveness  iif 
the  network  by  eliminating  unwanted  sources  of  variation,  emphasizing  crucial  fea¬ 
tures  of  the  data  and  pre-adjusting  the  data  to  the  characteristics  of  the  nonlinear 
function  used  in  the  network.  We  have  generally  found  that  normalizing  our  iti- 
put/output  values  to  within  the  0-1  range  provides  a  consistent  .setup  for  a  variety 
of  network  problems  and  thus  helps  to  transfer  insights  gained  in  one  problem  to 
other  situations.  For  some  problems  we  have  scaled  the  input  to  a  wider  ranne  to 
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take  advantage  of  the  interaction  of  the  size  of  the  input  values  with  the  initial 
weights,  biases,  and  dynamic  range  of  the  nonlinear  sigmoid  function.  Output  val¬ 
ues  are  also  typically  normalized  to  within  the  0-1  range  to  permit  a  choice  of 
making  the  final  network  output  either  a  simple  sum  of  the  input  to  the  final  units, 
or  of  making  the  output  units  like  all  other  units  by  passing  the  sum  through  the 
sigmoid  function. 

For  the  problem  at  hand,  we  used  normalization  to  accomplish  five  goals:  (1)  to 
eliminate  spectral  tilt  (the  trend  toward  lower  energy  values  at  higher  frequencies) 
and  thus  help  to  equate  differing  recording  environments;  (2)  to  eliminate  recording 
glitches  (where  sporadic  very  large  or  small  values  were  anomalously  recorded); 

(3)  to  insure  that  each  bark  in  the  spectrum  was  given  equal  initial  weighting; 

(4)  to  equalize  the  loudness  for  different  acoustic  recordings  of  different  subjects; 
and  (5)  to  make  the  range  of  tongue  movements  for  different  vocal  tracts  equivalent. 
For  these  purposes  the  acoustic  representation  was  normalized  within  bairks,  with 
the  highest  0.1%  of  the  values  in  each  bark  of  each  record  assigned  a  value  of  1, 
and  the  lowest  0.1%  a  value  of  0.  Other  inputs  were  given  proportionate  values. 
The  tongue  position  values  were  normalized  in  the  same  fashion,  but  in  this  case 
the  limits  were  set  so  that  the  maximum  value  of  both  dimension  of  each  pellet 
were  assigned  a  value  of  0.9  and  a  minimum  of  0.1.  These  values  were  chosen 
because  1  and  0  are  the  limits  of  the  nonlinear  sigmoid  function  applied  to  the  final 

irk  output,  and  so  are  not  realizable.  Also,  the  values  of  0.9  and  0.1  permit 
.  lutput  for  extreme  patterns  to  over  or  undershoot  the  target,  thus  facilitating 
overall  reduction  of  the  error. 

THE  CONTEXT  WINDOW  CONCEPT  AND  THE  PROBLEM  OF  TIME 

A  critical  charau:teristic  of  speech  that  must  be  accounted  for  by  any  successful 
application  of  network  technology  to  speech  problems  is  that  of  the  temporal  dis¬ 
tribution  of  the  speech  signal.  The  meaning  of  an  acoustic  event  at  any  one  time 
depends  heavily  on  the  acoustic  signal  which  surrounds  it.  Perhaps  the  simplest 
technique  for  capturing  this  temporal  dependency  is  to  cut  out  from  the  signal  a 
section  of  time  that  is  presumed  towcontain  the  category  of  interest,  and  present 
it  to  the  net  as  a  static  representation.  This  is  appropriate  if  the  goal  is  to  iden¬ 
tify  isolated  phonemes  or  words,  and  has  been  used  by  Burr^  and  Waibel.^'*  While 
this  solves  the  context  problem,  it  creates  serious  problems  of  segmentation  and 
alignment.^  The  segmentation  problem  is  particularly  critical  in  the  selection  of 
prototypes  that  are  subword  units,  since  phonemic  information  is  overlapped  in  the 
speech  signal^ ^  and  therefore  any  particular  short-term  segment  contains  informa¬ 
tion  about  several  phonemes.  For  word  or  syllable-length  prototypes,  segmentation 
can  be  artificially  accomplished  by  having  the  speaker  produce  the  tokens  in  isola¬ 
tion.  Unfortunately,  this  requires  that  the  user  of  the  trained  network  also  separate 
his  words  by  brief  pauses,  vastly  reducing  the  speech  recognizer’s  utility.  Further¬ 
more,  this  form  of  segmentation  creates  a  critical  alignment  problem  in  that  the 
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test  tokens  must  now  be  aligned  in  the  input  window  in  the  same  way  as  the  train¬ 
ing  tokens.  How  this  is  to  be  done  automatically  is  not  clear,  though  Burr'  has 
successfully  used  a  modification  of  dynamic  time  warping,  and  many  researchers 
simply  have  aligned  by  hand.*®  leaving  the  problem  for  future  research. 

In  any  case,  since  our  problem  is  to  map  from  a  vector- valued  time  series  to 
a  continuous  curve,  segmentation  is  not  a  viable  optk  n  and  alternative  methods 
for  capturing  temporal  dependencies  are  needed.  One  option  would  be  to  introduce 
recurrences  into  the  network  architecture  so  as  to  provide  the  network  with  a  kind 
of  memory  for  past  events.  This  method  was  applied  by  VVatrous  et  al.*®  to  the 
identification  of  phonemes;  by  Robinson  and  Fallside^^  to  the  speech  coding  prob¬ 
lem;  and  by  Laszlo  and  Zahorian^^  to  speaker  identification.  However,  since  this 
technique  involves  complexities  that  we  poorly  understand,  and  presents  problems 
of  how  to  prepare  the  input,  we  elected  to  use  another  technique — that  of  a  context 
window.  In  this  technique  a  time  window  of  n  frames  is  passed,  one  frame  at  a 
time,  over  the  acoustic  input,  producing  an  n  x  nmber^of -Frequency-Bins  in¬ 
put  matrix  associated  with  each  desired  output  value.  Thus,  each  of  the  tongue's 
particular  momentary  positions  is  associated  with  an  n  x  TimeSlice-Length  seg¬ 
ment  of  speech  acoustics.  Whether  this  segment  should  be  arranged  symmetrically 
around  the  moment  at  which  a  particular  tongue  position  is  serving  as  the  desired 
network  output,  or  be  arranged  so  that  most  or  all  of  it  is  in  the  future  or  past 
relative  to  the  desired  output,  must  be  decided  by  empirical  exploration  or  pho¬ 
netic  anaJysis.  A  somewhat  similar  technique,  called  a  time  delay  neural  network, 
was  used  by  Waibel  et  al.^'*  and  Hampshire  and  Waibel.’*  In  their  technique,  each 
hidden  unit  in  the  network  received  input  from  a  context  window  involving  only  the 
past,  and  each  hidden  unit  was  aligned  to  a  time  step  in  the  acoustic  input.  How¬ 
ever,  the  input  patterns  were  static  in  that  they  were  segmented  out  of  speech  and 
temporally  aligned  by  hand,  so  this  technique  is  not  compatible  with  the  present 
objective.  Nonetheless,  the  idea  of  a  time  delay  unit  on  the  second  hidden  layer 
may  contribute  to  further  development  of  the  context  window  technique  employed 
here. 


THE  NEURAL  NETWORK 

Having  already  decided  to  use  a  MLP  to  learn  the  mapping,  we  first  faced  the 
problem  of  how  to  implement  the  network.  Since  many  authors  had  warned  that  a 
major  practical  disadvantage  of  traditional  back-propagation  implementations  was 
that  training  could  take  a  very  long  time,*’^'*  we  early  decided  to  implement  the  net¬ 
work  on  a  CRAY  supercomputer.  This  led  to  a  decision  to  write  the  code  in  Fortran 
ourselves,  rather  than  modifying  existing  general  purpose  programs.  This  proved 
to  be  a  fortunate  decision,  because  by  using  the  program  optimization  utilities  of 
the  CRAY,  we  were  able  to  spot  ways  to  maximize  vectorization  for  this  partic¬ 
ular  problem,  and  to  then  easily  modify  the  code  for  faster  execution.  The  most 
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important  change  in  this  respect  was  to  read  all  the  acoustic  data  into  the  program 
once  as  a  N  umber  .of  -Frequency^ins  x  N  umber  .of  J'rames  matrix.  Then  when 
selecting  a  particular  context  window  for  input,  we  would  point  to  the  first  frame 
in  that  window  and,  by  reading  beyond  the  row  boundary  of  the  data  matrix,  im¬ 
plicitly  provide  a  one-dimensional  vector  of  length  N umber jof  JFrequency. Bins  x 
Context-Window  .Size  as  an  input  vector  for  the  net.  If  this  pointer  technique  had 
not  been  used,  it  would  have  been  necessary  to  store  each  input/output  pair  sepa¬ 
rately;  since  each  pair  in  a  typical  run  contained  about  850  floating  point  numbers, 
this  would  have  greatly  increased  the  amount  of  memory  required.  Such  tricks  can 
make  an  enormous  difference  in  the  time  and  memory  required  for  training,  and 
illustrate  that  successful  use  of  neural  networks  may  require  tailoring  the  code  to 
suit  your  problem,  as  well  as  tailoring  the  input  representation  to  suit  your  code. 


THE  TRAINING  PROBLEM— FINDING  A  GOOD  SET  OF  WEIGHTS 

Another  set  of  decisions  involved  the  selection  of  a  training  method.  Training  is 
here  defined  as  finding  a  good  set  of  weights  for  the  network — ones  that  will  map 
speech  acoustics  to  tongue  dorsum  position  for  words  not  in  the  training  set.  While 
training  time  itself  does  not  define  success,  it  should  be  short  enough  to  permit 
reasonable  exploration  of-the  network  parameter  space  and  to  allow  research  into 
the  best  form  and  size  of  the  training  set.  Thus,  achieving  rapid  training,  while 
not  crucial  to  successful  application  of  back  propagation,  is  a  sensible  goal.  If  we 
could  handcraft  weights,  and  thus  avoid  the  training  problem  entirely,  we  would 
do  so — but  in  that  case  we  could  have  probably  solved  the  problem  in  some  more 
efficient  way  and  would  not  be  using  neural  networks  at  all. 

We  considered  three  approaches  to  training:  gr2uJient  descent  plus  momen¬ 
tum,  implemented  as  a  back-propar  ■  ion  algorithm^”;  Quickprop,  a  modification  in 
which  parabolic  projections  on  the  weight  changes  are  used®;  and  conjugate  gradi¬ 
ent  optimization,  in  which  the  entire  network  is  treated  as  a  vector  function  to  be 
minimized.*®  All  three  techniques,  when  applied  to  our  task,  were  able  to  find  a  set 
of  weights  that  would  map  the  training  input  to  the  desired  output  within  a  small 
RMSE.  However,  back  propagation  produced  the  best  overall  results  both  in  terms 
of  time  to  train  and  success  of  generalization.^^  Within  the  back-propagation  tech¬ 
nique,  we  used  three  methods  to  reduce  training  time:  (1)  use  the  best  hardware, 
(2)  optimize  the  code  for  that  hardware,  and  (3)  find  good  combinations  of  network 
parameters.  Of  these  methods,  1  and  2  do  not  affect  generalization;  however,  we 
have  found  that  good  training  parameters  are  generally  good  settings  for  successful 
generalization. 

Having  decided  to  use  a  gradient  descent  method,  the  question  arises  as  to  how 
large  a  step  to  take  down  the  gradient  at  each  weight  update.  This  is  the  problem 
of  selecting  a  proper  learning  rate.  In  general,  as  is  illustrated  in  Figure  1,  there 
is  a  U-shaped  function  that  relates  Learning  Jiate  to  the  time  required  to  reach 
the  RMSE  error  level  set  for  termination.  In  the  example  illustrated,  and  in  others 
we  have  explored,  there  is  a  broad  range  of  Learning-Rate  values  that  produce 
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approximately  equivalent  results,  with  rapid  deterioration  of  performance  outside 
this  range.  The  strategy  we  adopted  for  choosing  the  size  of  the  step  in  weight  space 
was  to  move  down  the  gradient  as  far  as  possible  consistent  with  a  rough  overall 
decrease  in  the  total  error.  .Sin^e  we  are  only  interested  in  finding  a  set  of  weights 
that  will  produce  a  map  with  the  required  accuracy  and  not  in  how  we  found  it. 
we  accept  fairly  erratic  iteration-to-iteration  movement  of  the  R.MSE  in  the  hope 
that  one  large  step  will  move  quickly  to  an  adequate  position.  This  strategy  will  be 
more  successful  if  there  are  large,  easy-to-find  regions  of  weight  space  that  produce 
adequate  maps.  If  the  solution  set  is  a  small  portion  of  the  space,  then  this  sort  of 
erratic  search  will  be  proportionately  less  successful. 

A  second  parameter  closely  related  to  learning  rate  is  the  M omentum  J'trm . 
which  determines  the  proportion  of  the  last  weight  change  to  be  added  to  the 
current  weight  change.  Again,  we  expect  a  l  -shaped  function  relating  the  size  of 
the  momentum  tt'rm  and  the  time  r<'(|uired  to  reach  termination  criterion.  Fig¬ 
ure  2  shows  this  function  for  the  case  where  f, earning  .Rate  =  1.0.  Peeling  and 
Moore^'  found  that  for  a  static  <ligit  recognition  task,  the  appropriate  value  of  the 
M omentum JTerm  depended  on  an  interaction  with  the  Learning Jiate  value.  We 
would  certainly  expect  this  to  hold  here,  too.  but  also  expect  that  the  shape  of  the 
error  surface  in  weight  space  produced  by  the  particular  problem  will  have  an  even 
greater  effect  on  the  choice  of  a  Momentum  J'erm.  A  large  momentum  term  would 
likely  be  helpful  if  the  error  surface  were  relatively  flat  and  smooth,  while  harmful 
if  the  surface  were  irregular  with  narrow  valleys.  At  any  rate,  our  tactic  has  generally 


Speaker  6,  record  19  -  >  tongue  dorsum  verticol 


LearningJ^ote 


FIGURE  1  Mean  epu 
time  to  reach  termination 
criterion  as  a  function  of 
the  Learning. Rate.  Error 
bars  represent  the  range  of 
five  runs  with  different  initial 
weights. 
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FIGURE  2  Mean  cpu 
time  to  reach  termination 
criterion  as  a  function  of 
M omentum.Term.  Error 
bars  represent  the  range  of 
five  runs. 


been  to  initially  set  the  M omentumJTerm  to  a  relatively  low  value,  then  find  a 
good  Learning-Rate,  and  finally  do  a  rough  search  for  a  better  M omentum JTerm. 

We  would  not  expect  that  the  choice  of  Learning  -Rate  or  MomentumJ'erm 
would  have  any  significant  effect  on  generalization  success,  providing  the  net  was 
trained  to  the  same  final  RMSE.  Figure  3  show  the  correlation  between  the  ob¬ 
tained  and  desired  output  for  a  test  set  as  a  function  of  the  Momentum  JTerm.  As 
expected,  no  substantial  effects  were  found  for  this  parameter,  nor  were  they  found 
when  Learning-Rate  was  varied.  Thus,  unless  one  needs  very  rapid  training,  such 
as  might  be  required  in  an  on-line  training  situation,  a  sensible  strategy  might  be 
to  guess  values  for  Learning-Rate  and  M  omentum  JTerm  and  do  a  crude  search 
only  if  the  time  to  train  interferes  with  the  progress  of  the  research. 

An  issue  related  to  Learning-Rate  is  that  of  the  proper  learning  termination 
criterion.  The  goal  is  to  find  a  level  of  training  that  can  be  achieved  in  an  affordable 
period  of  time,  which  will  also  map  both  the  training  and  test  sets  to  the  correct 
output  within  a  tolerable  error.  The  tendency  is  to  assume  that  better  performance 
on  the  test  set  will  necessarily  mean  better  performance  in  generalization  testing, 
and  therefore  train  to  the  lowest  level  attainable.  We  have  found  that  this  strategy 
generally  produces  an  effect  we  call  overtraining,  in  which  the  network  weights  come 
to  encode  specific  characteristics  of  the  training  exemplars  which  are  not  present  in 
the  test  patterns.  This  causes  poorer  performance  during  testing.  To  avoid  this,  one 
can  experiment  with  different  termination  levels  or  increase  the  size  and  generality 
of  the  training  set,  thus  making  overtraining  more  difficult. 
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Speakers  4,6,  record  21,  trained  on  Speaker  5,  record  19 


FIGURE  3  Mean  correlation 
between  the  obtained 
and  desired  output  during 
the  generalization  test 
as  a  function  of  the 
MomentumJTerm.  Error 
bars  represent  the  range  of 
five  runs. 


For  training  regimes  such  as  the  present  one  in  which  there  are  a  relatively 
small  fixed  number  of  input/output  pairs,  one  can  update  the  network  weights  af¬ 
ter  each  pair  (pattern  training),  or  after  the  entire  set  has  been  presented  (epoch 
training).  The  advantage  of  epoch  training  is  that  the  measure  of  total  error, 
53"_i(o6fained,  —  desired,)^,  where  n  is  the  total  number  of  patterns,  is  guar¬ 
anteed  to  decline  with  training,  providing  the  size  of  the  step  down  the  gradient 
is  sufficiently  small.  With  pattern  training  no  such  guarantee  exists,  and  any  gain 
accomplished  by  moving  down  the  gradient  computed  for  one  pattern  may  be  lost 
when  moving  in  the  direction  specified  by  another  pattern.  Nonetheless,  we  have 
found  pattern  training  to  produce  much  more  rapid  learning,  with  no  loss  in  qual¬ 
ity  of  the  solution  obtained. Of  course,  pattern  training  must  be  accompanied 
by  random  presentations  of  the  patterns  in  the  training  set  to  avoid  a  limit  cycle 
where  early  patterns  move  the  weights  in  one  general  direction,  later  patterns  in 
another,  and  then  the  next  presentations  of  the  early  patterns  move  the  weights 
back  again.  Apparently,  pattern  training  combined  with  relatively  large  steps  (high 
Learning Jiate)  finds  weights  which  reduce  the  total  error  because  the  majority 
of  the  patterns  produce  gradients  which  approximate  the  total  gradient,  and  after 
those  majority  gradients  become  flatter,  the  unusual  patterns  (those  with  grossly 
different  gradients  relative  to  the  weight  position  obteiined)  are  then  resolved.  But 
in  any  case,  as  a  practical  matter,  pattern  training  has  been  vastly  superior  to  epoch 
training  for  this  project. 
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THE  SIZE  AND  CONFIGURATION  OF  THE  NETWORK 

One  way  to  maJce  training  relatively  easy  is  by  increasing  the  number  of  weights  in 
the  net,  so  that  there  are  more  free  parameters  to  adjust.  However,  if  the  number 
of  weights  begins  to  approach,  or  even  exceed,  the  number  of  input/output  pairs 
in  the  training  set,  the  weights  can  potentially  encode  a  sort  of  look-up  table  or 
memorization  of  the  training  set  and  thus  fail  to  extract  features  that  are  generally 
characteristic  of  the  input  class.  This  may  cause  the  network  to  fail  to  correctly 
classify  new  exemplars  when  they  are  presented  during  the  generalization  phase. 
However,  just  because  it  is  possible  for  the  net  to  reach  a  poor  solution,  does  not 
mean  that  it  will  actually  do  so  in  practice.  It  is  quite  possible,  especially  if  many  of 
the  input/output  pairs  are  relatively  similar  to  each  other,  that  the  solution  found 
will  be  a  general  one  that  will  successfully  map  new  input  to  the  desired  output. 

The  easiest  way  to  increase  the  number  of  free  parameters  in  a  fully  connected 
net  is  to  increase  the  number  of  hidden  units  in  the  first  layer.  The  motivation  for 
doing  so  is  to  provide  enough  units  for  relatively  rapid  learning,  while  still  forcing 
the  extraction  of  general  features.  Figure  4  shows  the  time  to  reach  termination 
criterion  as  a  function  of  the  number  of  hidden  units  in  the  first  layer.  As  the 
net  size  increases,  each  pass  through  the  net  takes  more  time  because  there  are 
more  connections  to  evaluate.  However,  fewer  iterations  through  the  training  set 
are  required  because  a  solution  should  be  easier  to  find.  For  this  problem  we  did 
not  find  that  the  number  of  hidden  units  affected  the  success  of  generalization.  This 
result  was  unexpected  because  we  haul  felt  that  fewer  hidden  units  would  force  more 
abstract  representations  of  the  input  and,  thus,  produce  a  better  generalization  to 
input  not  in  the  training  set.  We  expected  to  pay  for  this  benefit  by  longer  trauning 
times  and  poorer  performance  on  the  training  set.  However,  the  expected  result  wais 
not  obtained,  perhaps  because  the  few  units  (>  2)  required  for  successful  training 
encode  features  that  are  equally  prominent  in  the  test  set. 

For  some  problems  it  has  been  shown  that  aidding  a  second  hidden  layer  to  the 
network  in''»‘ease  the  succes‘3  of  generalization^^  since  the  second  hidden  layer 
can  form  a  more  abstract  representation  of  the  input  pattern.  While  this  may  be 
true  for  many  problems,  for  this  problem  we  did  not  find  any  advantage  to  adding  a 
second  layer.  For  example,  a  net  with  one  layer  of  13  units  generalizes  from  training 
on  one  recording  to  a  second  recording  from  two  other  speakers  with  a  correlation 
between  the  obtained  and  desired  tongue  dorsum  trajectories  of  r  =  0.76.  Adding 
a  second  layer  of  five  units  produces  a  nearly  identical  correlation  of  r  =  0.77. 

Of  the  parameters  that  define  the  state  of  the  network,  two  are  directly  con¬ 
cerned  with  the  representation  of  speech:  the  Context  .Window  JSize,  and 
Percent  J'uture — the  proportion  of  the  context  window  which  is  future  or  past 
relative  to  the  momentary  tongue  dorsum  position  serving  as  target.  Since  it 
is  reasonable  to  presume  that  values  which  are  useful  for  one  speech  problem 
will  likely  also  be  useful  for  other  similar  problems,  we  first  varied  the  value  of 
Context.Window.Size  from  64  msec  to  768  msec,  trained  on  one  record  from  one 
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SpeoKcr  5.  record  19  ->  *onquc  oorsum  vertical 
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FIGURE  4  Mean  cpu  time  to 
reach  termination  criterion  of 
as  a  function  of  the  number  of 
units  in  the  first  hidden  layer. 
Error  bars  represent  the  range 
of  five  mns. 


speaker,  and  tested  on  different  records  from  two  other  speakers.  Figure  5  shows 
the  mean  CPU  time  required  to  reach  a  RMSE  of  0.06,  while  Figure  6  shows  the 
mean  correlation  between  obtained  and  desired  during  the  testing  phase.  Fortu¬ 
nately,  the  two  criteria  for  good  parameter  settings  provide  similar  answers. 
good  Context AVindow^ize  for  both  training  time  and  generalization  success  is 
from  256  to  384  msec.  This  period  is  about  the  duration  of  a  spoken  syllable,  which 
might  be  assumed  to  be  the  maximum  size  that  has  clear  acoustic  information 
about  the  phonemes  being  formed  and  consequently  the  tongue  motions  producing 
them.  Figures  7  and  8  show  the  same  measures  as  a  function  of  the  proportion  of 
the  context  window  that  is  in  the  past  relative  to  the  target.  Examination  of  these 
results  show  that  a  good  value  for  this  parameter  is  around  0.4,  suggesting  that 
information  from  both  the  pcist  and  Future  is  useful,  while  elimination  of  all  future 
acoustic  information  makes  the  problem  much  more  difficult. 


THE  SIZE  OF  THE  TRAINING  SET 

One  of  the  most  obvious  ways  to  attempt  to  increase  the  success  of  generalization 
is  to  increase  the  size  of  the  training  set.  The  argument  is  that  since  a  single  set 
of  weights  is  now  required  to  map  many  more  examples  to  the  appropriate  out¬ 
put,  the  net  will  be  forced  to  use  features  common  to  all  members  of  each  class  in 
order  to  successfully  learn  the  mapping.  Wh?it  can  happen,  however,  is  that  large 
training  sets  may  make  it  more  likely  that  the  net  will  reach  some  local  minimum 
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Speaker  6,  record  19  ->  tongue  dorsum  vertical 
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FIGURE  5  Mean  cpu 
time  to  reach  termination 
criterion  as  a  function  of 
the  context  window  size. 
Error  bars  represent  the 
range  of  five  runs. 
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FIGURE  6  Mean  correlation 
between  obtained  and  desired 
output  during  the  generalization 
test  as  a  function  of  the 
context  window  size.  Error 
bars  represent  the  range  of 
five  runs. 
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Speoker  6.  record  19  ->  tongue  dorsum  vertical 


proportion  of  context  window  before  inferred  position 


FIGURE  7  Mean  cpu 
time  to  reach  termination 
criterion  as  a  function  of  the 
proportion  of  context  window 
that  is  in  the  past  relative  to 
the  momentary  position  of 
the  tongue  serving  as  the 
target.  Error  bars  represent 
the  range  of  five  runs. 


Soeokers  4,6,  record  21.  troinea  on  Speoker  6.  record  19 


FIGURE  8  Mean  correlation 
between  the  desired  and 
obtained  output  during  the 
generalization  test  as  a  function 
of  the  proportion  of  the  context 
window  that  is  in  the  past 
relative  to  the  momentary 
position  of  the  tongue  serving  as 
the  target.  Error  bars  represent 
the  range  of  five  runs. 
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FIGURE  9  Mean  cpu 
time  to  reach  termination 
criterion  as  a  function  of 
the  number  of  input/output 
pairs  in  the  training  set. 

The  upper  curve  represents 
the  results  when  the  pairs 
presented  to  the  net  were 
selected  at  random  from 
the  entire  training  set,  while 
the  lower  curve  shows  the 
results  when  the  pairs  were 
selected  sequentially.  Error 
bars  represent  the  mean  of 
five  runs. 


FIGURE  10  Mean 
correlation  between 
desired  and  obtained 
output  during  the 
generalization  test  as 
a  function  of  the  number 
of  input/output  pairs  in 
the  training  set.  The 
upper  curve  represents 
the  results  when  the 
pairs  presented  to  the 
net  during  training  were 
selected  at  random  from 
the  entire  set,  while 
the  lower  curve  shows 
the  results  when  the 
pairs  were  selected 
sequentially.  Error  bars 
represent  the  mean  of 
five  runs. 
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or  that  training  time  will  become  too  long  for  practiced  application.^'*  Furthermore, 
if  the  extra  training  vectors  are  not  fairly  representative  of  the  distribution  of  the 
set  of  potential  iiiput  vectors,  then  the  net  may  overlearn  one  class  of  patterns — 
since  correctly  mapping  that  class  will  have  a  disproportionate  effect  on  reducing 
the  total  error,  while  incorrectly  mapping  the  poorly  represented  classes  will  have 
a  small  effect  on  overall  error.  Thus,  increasing  the  training  set  size  should  not 
automatically  be  considered  a  wise  strategy,  though  should  clearly  be  considered  if 
generalization  success  is  below  the  required  level. 

Figure  9  shows  the  time  to  train  as  a  function  of  training  set  size.  Two  tech¬ 
niques  were  used  to  select  the  members  of  the  training  set.  In  the  first  technique, 
members  of  the  training  set  were  randomly  chosen  from  the  26,912  pairs  availabl'^ 
from  four  recordings  from  three  different  speakers,  with  the  restriction  that  equal 
numbers  of  pairs  were  taken  from  each  speaker.  A  fifth  recording  of  each  speaker 
was  reserved  for  testing.  In  the  second  technique,  the  pairs  were  selected  sequen¬ 
tially  from  the  recordings,  beginning  with  the  first  recording  from  the  first  speaker 
and  continuing  until  all  pairs  in  that  record  w'ere  exhausted,  and  then  more  pairs 
were  included  by  adding  all  the  pairs  in  additional  recordings.  This  sequential  tech¬ 
nique  produced  considerably  faster  training  than  did  random  selection  because  the 
training  sets  contained  many  vectors  that  were  similar  to  each  other.  This  was  ex¬ 
pected  because  a  reccrdiag^cout.ains  one  speaker  repeating  eight  words  three  times. 
While  this  rapid  training  might  appear  to  be  an  advantage,  the  sequential  selection 
technique  produced  problems  during  generalization.  N'ew  input  vectors  that  are  dis¬ 
similar  to  those  that  were  well  represented  in  the  training  set  are  likely  to  be  poorly 
mapped.  This  effect  is  illustrated  in  Figure  10,  where  the  success  during  testing  of 
the  two  selection  techniques  are  plotted  against  training  set  size.  There  the  advan¬ 
tage  of  the  random  selection  method  is  clearly  shown,  since  very  satisfactory  results 
were  obtained  after  selection  of  about  100  vectors  from  each  speaker. 


RESULTS 

The  purpose  of  an  application  of  baiHt  propagation  is  generally  not  to  understcind 
how  network  responds  to  varying  its  parameters,  but  rather  to  obtain  a  system 
that  accurately  maps  new  input  to  the  correct  output.  The  degree  of  accuracy 
required  generally  depends  crucially  on  the  uses  to  which  the  output  will  be  applied. 
Figure  11  shows  the  actual  and  inferred  tongue  dorsum  vertical  position  for  one  of 
the  three  speakers  for  a  segment  of  speech  not  included  in  the  training  set.  The 
correlation  between  the  two  curves  was  r  =  0.94,  showing  that  the  general  shape 
of  the  curves  was  highly  similar,  while  the  magnitudes  of  the  two  curves  were 
somewhat  different  (RMSE=:  .09).  However,  for  the  purpose  of  identifying  when 
the  velar  closure  occurred,  or  for  providing  a, visual  display  of  the  tongue  dorsum's 
movement,  the  inferred  plot  is  clearly  adequate. 
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Speaker  6,  record  24,  trained  on  Speakers  4,6,6,  rec  19,20,21,23 


FIGURE  11  The 
actual  and  inferred 
tongue  dorsum  vertical 
movements  during  the 
speech  segment  shown  in 
the  acoustic  record  at  the 
bottom  of  the  figure.  The 
location  of  the  actual  velar 
closures  corresponding 
the  /k/  phonemes  are  also 
shown. 


These  results  were  produced  by  a  one-layer  network  with  eight  hidden  units  2ind 
one  output  unit.  Training  was  done  with  a  Learning Jiate  of  1.0  and  a 
M omentum JTerm  of  0.5,  and  was  terminated  when  RMSE  reached  0.08.  The  con¬ 
text  window  size  was  320  msec  with  60%  of  the  window  in  the  future,  and  the 
training  set  consisted  of  6000  randomly  selected  pairs  from  four  different  record¬ 
ings. 

Obviously,  since  an  exhaustive  search  of  the  network  parameter  space  was  not 
completed,  some  other  network  configuration  might  have  produced  better  results. 
This  is  not  of  particular  concern  here,  however,  because  we  have  achieved  satis¬ 
factory  results  with  the  current  tr^iining  set.  With  other  problems,  where  good 
results  have  not  been  achieved,  the  possibility  of  improvement  by  manipulation  of 
the  network  parameters  presents  a  problem — how  to  decide  if  further  tweaking  of 
the  parameters  will  be  beneficial  or  a  waste  of  computer  cycles.  In  general,  our 
sr>'  ition  would  be  to  try  large  variations  in  the  parameters  to  get  a  rough  idea  of 
I  f'  useful  range  of  values,  and  then  select  some  moderate  value  and  not  attempt 
essive  fine  tuning.  We  have  found  that  potent  effects  on  results  are  more  easily 
obtained  by  changing  the  input  representation  or  the  training  set  than  by  making 
minor  adjustments  in  network  parameters. 

In  some  respects,  the  results  we  have  achieved  are  suspiciously  good — particu¬ 
larly  the  success  obtained  after  training  on  only  100  randomly  selected  320  msec 
bits  of  speech  from  each  speaker.  Perhaps  we  have  overestimated  the  difficulty  of 
the  problem.  Since  the  speech  signal  is  obviously  caused  by  movements  of  the  vocal 
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apparatus,  it  should  not  b«'  surprising  that,  a  nciwork  can  easily  firul  a  functional 
rc'lat ionship  betwetui  tin-  two  domains.  On  the  other  hand,  perhaps  our  succi'ss  is 
the  result  ol  choosinji;  traininu,  and  test  sets  that  are  so  clo.sely  malchi'd  that  they 
an-  a  poor  re[)res(>ntat ion  lor  any  realistic  application. 

(  learly,  wi'  now  need  to  extcinl  the  current  results  to  more  difficult  jirobli-ms 
involvinu;  other  speech  articulators,  diffi'rint;  speaking  rales,  mori'  general  spe»‘ch 
samidt's,  and  a  greater  variety  of  speaki'rs.  Our  optimistic  hyp'  sis  is  that  the 
e.xperience  gamed  fiere  will  transfer  to  new  situations  and  permit  i.*pid  re.soliition  of 
ri('w  probletiLs  .as  they  develop.  If  this  hypotfiesis  is  correct,  back-propagation  tech- 
m(pi<'s  will  suri’ly  find  increasingly  widespread  application  to  currently  intractable 
problems. 
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Pattern  Formation  in  Biological  Systems 


Pattern  formation  refers  to  the  processes  in  development  by  which  ordered 
structures  arise  within  an  initially  homogeneous  or  unstructured  system.  Under¬ 
standing  these  processes  is  absolutely  essential  for  understanding  regulatory  mech¬ 
anism  in  development.  It  is  also  essential  for  understanding  the  developmental  ori¬ 
gin  of  biological  form,  and  ultimately,  for  understanding  morphological  evolution. 
In  practice,  pattern  formation  refers  to  things  like  the  processes  in  embryos  that 
determine  where  gastrulation  will  occur,  or  the  processes  that  define  where  bones 
will  condense  in  the  mesenchyme  of  a  developing  limb,  how  many  there  will  be, 
their  shape,  and  their  positional  relation  to  each  other.  Or  in  plants,  where  leaves 
will  form  on  the  stem  of  a  plant,  and  .what  shape  those  leaves  will  have. 

Here  we  will  be  particularly  concerned  with  processes  of  pattern  formation  that 
occur  quite  late  in  animal  development,  in  particular,  the  development  of  pigment 
patterns.  Pigment  patterns  have  several  advantages  as  model  systems  in  which 
to  study  the  principles  of  pattern  formation.  First,  color  patterns  are  almost  al¬ 
ways  two  dimensional,  so  they  can  be  studied  on  the  plane  without  having  to  use 
projections  or  collapse  dimensions.  This  makes  them  far  easier  to  deal  with  thcin 
three-dimensional  processes  in  development,  and  makes  color  patterns  particularly 
attractive  for  computer  modeling,  because  the  whole  pattern  can  be  represented  on 
the  two-dimensional  computer  screen.  Second,  since  they  develop  relatively  late,  the 
processes  that  give  rise  to  the  pattern  occur  iri  a  system  that  is  usually  macroscopic 
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and,  therefore,  more  easily  manipulated  experimentally  than  are  early  embryos. 
Third,  since  pattern  is  manifested  as  the  local  synthesis  of  pigment,  it  is  easy  to 
detect.  Fourth,  since  the  chemical  nature  and  biosynthetic  pathways  of  most  pig¬ 
ments  are  known,  it  is  in  principle  possible  to  fully  understand  all  control  pathways 
in  the  system  at  the  chemical  and  molecular  level.  Finally,  it  is  in  systems  like  color 
patterns,  where  all  the  molecular  and  biochemical  steps  are  in  principle  knowable 
and  understandable,  that  we  have  the  best  chance  of  uncovering  the  full  sequence 
of  events  that  links  genotype  and  phenotype,  something  that  has  yet  to  be  done  for 
any  morphological  event. 


THREE  MECHANISMS 

The  processes  that  result  in  local  specialization  of  structure  and  function  can  be 
formally  subdivided  into  two  distinctive  kinds;  those  that  involve  cell  migration  and 
mechanical  interactions  among  cells  (such  as  traction  and  differential  adhesivity), 
and  those  that  involve  chemical  pre-patterning.^'*'^®  Murray^'*  points  out  that  the 
two  mechanisms  are  quite  different  because  in  chemical  pre-patterning  the  chemical 
pattern  precedes  morphogenesis,  while  in  patterning  by  mechanochemical  cell-cell 
interactions,  morphogenesis  is  the  immediate  consequence  of  the  patterning  process. 
There  are  a  few  examples  of  patterning  systems  that  are  purely  one  or  the  other 
(the  formation  of  butterfly  wing  patterns,  which  we  will  deal  with  below,  is  one 
of  them),  but  in  most  cases  both  mechanisms  seem  to  operate,  such  as  when  a 
chemical  gradient  allows  migrating  cells  to  aggregate  and  interact. 

Among  the  best  studied  examples  of  cell  movement-mediated  patterning  are 
aggregation  and  fruiting  body  formation  in  the  slime  mold  Dictyostelium,  and  the 
formation  of  bones  in  the  developing  limbs  of  vertebrates.  In  Diciyostetium  we  have 
one  of  the  very  few  cases  in  which  we  actually  know  what  the  chemical  morphogen 
is  whose  gradient  stimulates  the  initial  aggregation.  Here  the  aggregation  signal  is 
cyclic  AMP  (cAMP),  which  is  secreted  by  isolated  cells  when  they  run  out  of  food. 
When  other  cells  perceive  this  sigiral,  they  are  attracted  to  its  source  and  migrate 
up  the  cAMP  gradient.  Such  migrating  cells  also  begin  to  secrete  cAMP  themselves, 
and  a  complex  set  of  interactions  ensues  that  treuisiently  gives  rise  to  interesting 
cell  aggregation  patterns  and  eventually  results  in  the  clumped  aggregates.  While 
aggregating,  the  population  of  cells  does  exhibit  spatial  patterns  of  spiral  waves 
very  similar  to  those  seen  in  the  Belousov-Zhabotinski  reaction,  and  in  many  models 
using  cellular  automata  (see  figures  in  Winfree,^^  Tomchik  and  Devreotes,'^  and 
Murray^"*). 

Patterned  bone  formation  in  the  mesenchyme  of  developing  vertebrate  limbs 
hzis  been  studied  in  a  variety  of  contexts.  Perturbation  experiments  have  revealed 
complex  interactions  that  have  been  modeled  conceptually  as  the  well-known  clock 
face  model  of  French  et  al.®  and  Bryant  et  al.,^  and  mechanistically  as  a  traction- 
aggregation  mechanism  by  Oster  et  al.^°  Evolutionary  morphologists  have  been 
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particularly  interested  in  the  development  of  bone  patterns  in  vertebrate  limbs 
because  the  well-established  homologies  among  the  bones,  and  the  extensive  histor¬ 
ical  pattern  of  their  transformation  preserved  in  the  fossil  record,  makes  this  one 
of  the  most  attractive  (and  tractable)  systems  in  which  to  study  the  interplay  of 
developmental  and  evolutionary  processes  in  the  shaping  of  biological  form. '  ■’ 

Pigment  patterns  arise  by  the  same  two  mechanisms,  of  cell  movement  and 
chemical  pre-patterning  and,  in  mollusk  shells,  by  a  third  distinctive  mechanism 
that  involves  a  complex  interplay  between  the  tissue  of  the  mantle  and  the  shell 
as  it  is  secreted.  In  vertebrates,  the  pigment  pattern  of  the  skin  is  produced  by 
melanophores,  which  are  cells  that  produce  the  black/brown  pigment,  melanin. 
Melanophores  arise  from  the  neural  crest  (along  the  dorsal  midline)  early  in  embry¬ 
onic  development  and  from  there  migrate  across  the  body  surface. The  color 
patterns  of  fish,  frogs,  zebras,  giraffes,  and  leopards  are  therefore  the  consequence 
of  the  migration  and  patterned  accumulation  of  pigment-producing  cells. 

In  insect  color  patterns,  the  mechanism  is  quite  different.  The  insect  epidermis 
is  only  one  cell  layer  thick,  and  the  cells  are  attached  to  the  overlying  cuticle  most  of 
the  time.  Cell  migration  and  cell  rearrangement  are  therefore  generally  impossible. 
All  patterning  in  the  epidermis  must  therefore  take  place  by  mechanisms  of  cell- 
to-cell  communication.  The  cells  of  the  insect  epidermis  are  interconnected  by  gap 
junctions  and  are  thus  coupled  electrically  and  are  potentially  coupled  by  diffusion. 
Signals  can  thus  be  transmitted  across  substantial  distances  and  control  over  this 
communication  can  be  exercised  by  modulating  the  number  and  distribution  of  gap 
junctions  that  are  open  at  any  one  time.  Pigment  patterns  are  thus  the  result  of 
local  cell  differentiation  in  a  static  monolayer  of  cells.  Formation  of  the  pattern 
does  not  involve  cell  migration,  nor  is  the  pattern  subsequently  modified  by  cell 
rearrangement. 

In  the  shells  of  gastropods  (snails)  and  bivalves  (clams),  the  color  pattern  is 
laid  down  as  the  shell  is  secreted.  The  pigment  of  the  pattern  resides  not  in  cells 
but  in  the  non-living  shell.  In  contrast  to  the  two  previous  cases,  pattern  formation 
in  shells  is  essentially  a  one-dimensional  process.  During  growth  the  mantle  adds 
material  to  the  leading  edge  of  the  shell  and  at  the  same  time  secretes  pigments 
at  appropriate  locations  to  produce  species-characteristic  color  patterns  (stripes, 
spots,  zig-zags,  etc.).  The  mantle  is  “a  motile  organ  and  moves  frequently  relative 
to  the  margin  of  the  shell  as  the  animal  locomotes,  rests,  and  hides.  Consequently, 
shell  deposition  is  not  continuous  but  shows  both  regular  and  erratic  periods  of 
growth  and  rest.  The  mantle  ultimately  controls  where  and  when  the  shell  will 
grow,  and  also  where  exactly  pigment  will  be  deposited.  The  pigment  pattern  is 
thus  the  result  of  the  behavior  of  the  whole  mantle  and  of  the  way  it  interacts  with 
the  growing  edge  of  the  shell. 

The  color  patterns  of  vertebrates,  of  insects,  and  of  mollusk  shells  thus  come 
about  by  three  fundamentally  different  mechanisms.  Theoretical  work  has  shown, 
however,  that  the  essence  of  these  three  pattern-forming  processes  can  captured  by 
very  similar  sets  of  mathematical  equations.' This  suggests  that  the  principles 
involved  in  each  process  could  be  fundamentally  similar  even  though  the  actual 
mechanisms  are  not.  In  almost  all  cases,  lateral  inhibition  (short-range  activation 
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coupled  with  long-range  inhibition  of  a  particular  event)  provides  the  organizing 
mechanism,  and,  while  systems  may  differ  in  the  exact  means  by  which  effective 
activation  and  inhibition  is  achieved  (e.g.  cooperativity,  autocatalysis,  or  positive 
feedback,  versus  catabolism,  interference,  or  competition),  the  final  spatial  results 
of  the  process  are  similar  if  not  identical. 

In  the  sections  that  follow  we  will  assume,  for  the  sake  of  simplicity,  that  chem¬ 
ical  pre-patterning  is  the  process  at  work  because  such  a  process  can  be  modeled 
without  having  to  take  account  of  the  movement  of  cells  relative  to  one  another.  In 
addition,  we  will  assume  a  perfectly  two-dimensional  system.  Thus,  what  follows 
will  apply,  strictly  speaking,  only  to  pattern  formation  in  the  insect  integument.  As 
we  will  see,  these  assumptions  produce  a  rich,  complex,  and  largely  non-intuitive 
world  of  patterns,  that  begs  further  exploration,  both  experimental  and  theoretical. 


DIFFUSION 

In  biological  systems,  convection  (usually  via  a  circulatory  system)  and  diffusion 
(within  cells  and,  via  gap  junctions,  between  cells)  provide  the  most  common  means 
of  chemical  communication,  within  and  among  cells  and  tissues.  Convection  is  gener¬ 
ally  used  for  long-range  transport  and  appears  to  play  no  role  in  any  of  the  pattern 
formation  systems  that  have  been  studied  so  far.  Thus,  to  understand  patterning, 
we  need  to  understand  the  mechanism  and  consequences  of  diffusion. 

Diffusion  comes  about  by  the  random  movement  of  particles  produced  by  ther¬ 
mal  agitation.  The  mathematics  of  diffusion  has  been  widely  studied,  and  the  reader 
is  referred  to  the  text  by  Crank''  for  the  fundamentals,  and  to  Carslaw  and  Jzieger^ 
for  a  more  elaborate  treatment  of  special  cases.  In  one  spatial  dimension,  the  dif¬ 
fusion  equation  is  usually  written  as: 

^  =  DV\  (1) 

where  c  is  the  concentration  of  tlTe  diffusion  substance,  and  D  is  the  diffusion 
coefficient.  On  macroscopic  scales  diffusion  is  a  slow  process.  The  dimension  of 
the  diffusion  coefficient,  D,  can  be  used  to  get  an  idea  of  the  rate  of  diffusion.  If 
the  diffusion  coefficient  is  expressed  in  the  units  cm^/sec,  then  the  average  time  (in 
seconds)  it  takes  for  a  particle  to  diffuse  through  a  given  distance,  d  (in  centimeters), 
is  approximately  (P JD.^  Moderately  large  biochemical  molecules  diffuse  through 
the  cytoplasm  of  a  cell  with  D  =  10“^.  Such  a  molecule  would  take  an  average  of 
(10~'')^/10~^  =  10  seconds,  to  diffuse  across  the  diameter  of  a  typical  10  micron 
cell.  The  average  distance  over  which  diffusion  acts  within  a  given  period  of  time 
is  proportional  to  (Dt)'^^.®  Even  though  diffusion  is  an  inherently  slow  process,  it 
does  clearly  provide  a  relatively  effective  means  of  communication  over  the  small 
distances  (usually  1  mm  or  less)  and  time  periods  (hours  to  days)  that  are  relevant 
to  most  developmental  systems. 
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Diffusion-dependent  processes  can  also  exert  their  effect  rapidly  and  over  much 
larger  distances  if  they  are  coupled  to  some  amplifying  machinery.  The  diffusion  of  a 
large  charged  molecule  (say,  D  =  10~'  rm2/sec)  across  a  cell  membrane  can  rapidly 
change  the  local  balance  of  charge,  and  cause  the  diffusion  of  small  ions  towards 
or  away  from  the  area.  If  the  small  ions  have  a  diffusion  coefficient  of,  say,  10”^ 
and  act  as  intermediate  messengers,  then  the  rate  of  “signal  ’  propagation  caused  by 
diffusion  of  the  large  molecule  would  have  been  amplified  100-fold.  The  propagation 
of  an  action  potential,  which  is  basically  a  cytochemical  cascade  mechanism,  is  a 
well-known  example  of  the  amplification  of  a  diffusible  signal. 


EVOKING  AN  EFFECT:  THRESHOLDS 

Simple  linear  diffusion  from  a  source  into  a  medium,  or  from  a  source  to  a  sink, 
sets  up  a  gradient  in  the  concentration  of  the  diffusing  substance.  The  concentra¬ 
tion  at  a  particular  point  (p)  along  such  a  gradient  carries  information.  It  can  be 
used  to  estimate  both  the  distance  between  p  and  the  source  (or  the  sink),  and  the 
time  since  diffusion  began.  For  the  purposes  of  pattern  formation,  the  former,  the 
estimation  of  position  witTiin  a  diffusion  field,  is  the  more  interesting  and  useful 
one.  In  the  simplest  case  of  pattern  formation,  diffusion  from  a  point  source  sets 
up  a  gradient  of  a  chemical  across  an  otherwise  homogeneous  developmental  field, 
and  some  novel  developmental  event  is  caused  to  occur  wherever  the  concentration 
of  the  gradient  is  above  (or  below)  some  critical  value.  As  we  will  see  below,  the 
eyespots  in  the  wing  patterns  of  butterflies  are  produced  by  just  such  a  simple 
mechanism.  Changes  in  the  threshold,  and  changes  in  the  shape  of  the  gradient  can 
both  alter  the  dimension  and  position  of  the  “pattern  ”  within  the  total  field.  The 
formal  requirements  and  consequences  of  pattern  formation  by  such  simple  gradi¬ 
ent  systems  have  been  explored  by  Lewis  Wolpert^"^  in  his  “Theory  of  Positional 
Information.” 

On  this  view,  the  problem  of  pattern  formation  is  twofold:  first,  how  to  establish 
a  source  for  the  diffusing  signal,  and  second,  how  to  retrieve  the  information  in  the 
diffusion  gradient.  The  first  of  these  problems  is  by  far  the  most  difficult  one.  and 
we  will  take  it  up  below.  The  second  problem  can  be  rephrased  to  ask:  how  do  you 
set  up  a  threshold  so  that  the  continuously  distributed  gradient  in  one  substzuice 
(the  diffusing  signal)  is  translated  into  a  sharply  discontinuous  and  stable  change 
of  some  developmental  or  biochemical  event? 

Lewis  et  al.^^  have  developed  an  elegant  model  for  a  threshold  mecheinism. 
They  note  that  most  threshold  models  assume  an  allosteric  enzyme  whose  activity 
is  a  sigmoidal  function  of  substrate  concentration  (Figure  1).  The  problem  with 
such  a  model  is  that  along  a  gradient  of  substrate  concentration  the  transition  from 
the  inactive  form  to  the  active  form  of  the  enzyme  is  gradual  and  occurs  over  a  rel¬ 
atively  long  distance  (Figure  1).  Increasing  the  number  of  cooperaiing  subunits  in 
the  enzyme  increases  the  steepness  of  the  sigmoid  transition  and  thus  sharpens  the 
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“threshold”  to  some  degree.  Allosteric  enzymes  generally,  however,  have  no  more 
than  four  subunits,  and  that  puts  a  practical  limit  to  the  refinement  of  a  threshold 
by  this  means.  Lewis  et  al.*^  suggested  a  modification  of  the  allosteric  model  to  in¬ 
clude  also  a  linear  degradation  term.  They  suggest  the  following  structure.  Suppose 
a  gene  G,  which  produces  a  product  g,  that  stimulates  its  own  synthesis  by  positive 
feedback  at  a  rate  that  is  a  sigmoidal  function  of  its  concentration  {Ka9^/I^h  +  9~), 
and  that  g  breaks  down  at  a  rate  proportional  to  its  concentration  (— Jfccff)-  Sup¬ 
pose  further  that  the  synthesis  of  g  is  also  stimulated  by  a  signal  molecule  5,  at  a 
rate  that  is  linearly  proportional  to  the  concentration  of  S.  This  gives  the  following 
relationship: 


do 


h9' 


h  +  g" 


-  ^49 


(2) 


which  is  shown  graphically  in  Figure  2.  The  graph  of  the  rate  of  production  of  g  is  in 
effect  an  inclined  sigmoid  curve  whose  position  is  controlled  by  the  value  of  S.  When 
S  is  small,  the  reaction  has  three  steady  states,  two  of  which  are  stable  (Figure  2). 
If  the  system  starts  with  gene  G  off,  and  thus  with  no  g  present,  the  concentration 
of  g  will  tend  towards  its  low  steady  state.  Smcill  and  moderate  perturbations  in 
its  concentration  will  always  cause  g  to  return  to  this  low  steady  state.  However, 
if  the  concentration  of  S  goes  up,  the  level  of  the  curve  rises  and  there  is  eventu¬ 
ally  only  one  steady  stafe  of  g  (Figure  2),  much  higher  than  the  previous  one.  Thus, 
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FIGURE  1  Allosteric  model  for 
a  threshold.  The  concentration 
gradient  in  5  activates  an 
allosteric  enzyme  that  obeys 
the  Hill  equation.  The  degree 
of  saturation  of  the  enzyme, 
y,  that  corresponds  to  various 
points  along  the  gradient  is 
shown  in  the  lower  graph. 

The  threshold  provided 
by  this  mechanism  is  not 
sharp  and  the  transition  can 
extend  across  many  cells. 
Reprinted  by  permission  of  the 
publisher  from  “Thresholds  in 
Development”  by  J.  Lewis  et 
al.,  J.  Theor.  Biol.  65,  579- 
590.  Copyright  @  1 977  by 
Academic  Press  Inc.  (London). 
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FIGURE  2  Curves  produces 
by  Eq.  (2)  for  three  values  of 
the  signal  substance,  5.  As  5 
gradually  increases,  the  number 
of  stable  states  abruptly  fall 
from  two  to  one. 


if  S  increases  gradually,  there  will  be  a  sudden  transition  in  the  concent  ration  of 
g  from  its  low  to  its  high  steady  state.  A  smooth  and  continuous  change  in  the 
concentration  of  5  thus  results  in  an  abrupt  switch  in  the  concentration  of  g.  This 
gives  us,  then,  a  mechanism  for  a  sharp  threshold  in  the  control  variable,  5,  with 
no  intermediate  values  between  the  extremes  of  the  response  variable,  g. 

An  additional  interesting  and  useful  feature  of  this  model  is  that  it  has  a  kind  of 
“memory”  because,  once  g  has  switched  to  its  higher  steady  state,  it  will  stay  there 
even  if  S  subsequently  declines  or  disappears.  Thus  we  have  essentially  a  mechanism 
for  the  irreversible  activation  of  a  gene.  If  such  a  gene  controls,  for  instance,  the 
synthesis  of  pigment-forming  enzymes,  then  we  have  a  mechanism  for  producing  a 
patch  of  pigment  wherever  the  concentration  of  S  is  above  the  threshold  defined  bv 
Eq.  (2). 


REACTION  DIFFUSION 

Pattern  formation  by  diffusion  gradients  requires  at  the  very  minimum  the  existence 
of  a  source  of  the  diffusing  chemical.  If  pattern  regulation  is  important,  then  a  sink 
is  also  essential,  so  that  all  intermediate  values  of  the  gradient  are  always  present 
within  the  developmental  field.  It  should  be  clear  that  this  requirement  for  a  source 
(and  a  sink)  in  effect  pushes  the  problem  of  pattern  formation  back  one  step,  and 
the  issue  becomes  one  of  determining  what  causes  the  sources  and  sinks  to  be  where 
they  are. 

Though  unsatisfying  from  a  mathematical  point  of  view,  such  potentially  in¬ 
finite  regressions  in  control  mechanisms  are  biologically  reasonable  and  probably 
the  rule  rather  than  the  exception.  Development  is,  after  all,  a  complex  network  of 
causal  connections  in  which  any  process  works  correctly  only  if  all  the  preceding 
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ones  did  (at  least  within  certain  tolerances).  There  are,  however,  certain  condi¬ 
tions  under  which  a  stable  pattern  can  emerge  in  an  initially  homogeneous  and 
randomly  perturbed  field  without  the  need  for  initial  sources  or  organizing  cen¬ 
ters.  The  conditions  under  which  this  can  occur  were  discovered  by  Turing'^  and 
this  discovery  constitutes  one  of  the  major  advances  ever  made  in  the  theory  of 
biological  development.  Turing'®  showed  that  the  steady-state  condition  of  certain 
kinds  of  biochemical  reactions  can  be  made  spatially  unstable  if  at  least  two  of  the 
reactants  are  able  to  diffuse.  In  other  words,  if  the  reactants  are  free  to  diffuse, 
then  it  is  possible  for  them  to  become  stably  patterned  into  areas  of  high  and  ar¬ 
eas  of  low  steady-state  concentrations.  On  first  sight  this  is  a  non-intuitive  result, 
because  one  generally  thinks  of  diffusion  as  having  a  homogenizing  effect.  Under 
certain  conditions,  however,  diffusion  can  act  to  amplify  spatial  waves  of  certain 
critical  frequencies.  The  mathematics  behind  this  process  were  outlined  by  Tur¬ 
ing  and  have  been  more  fully  explored  by  many  other  authors  since.  Particularly 
readable  accounts  of  the  theory  and  the  conditions  under  which  such  diffusive  in¬ 
stabilities  arise  in  chemical  reaction  systems  are  given  by  Segel  and  Jackson''^  and 
by  Edelstein-Keshet.*^  and  a  more  technical  treatise  with  many  examples  is  given 
by  Murray.^"*  The  most  elaborate  exploration  of  the  consequences  and  possible  uses 
of  one  class  of  these  reaction- diffusion  mechanisms  is  given  by  Meinhardt.*' 

The  conditions  necessary  for  chemical  pattern  formation  in  reaction-diffusion 
systems  are  given  by  Edelstein-Keshet^  as  follows: 

1.  There  must  be  at  least  two  chemical  species. 

2.  These  chemicals  must  affect  each  other’s  rate  of  production  and/or  breakdown 
in  particular  ways. 

3.  These  chemicals  must  also  have  different  diffusion  coefficients. 

The  general  equation  system  for  reaction  diffusion  is: 


—  =F{A,B'  DaV\-\ 
at 

3  R 

—  =G{A,B)  +  DbV^B 


in  which  F(A,  B)  and  G{A,  B)  define  the  reaction  equations  for  the  two  interacting 
chemical  species. 

Most  mechanisms  for  chemical  patterning  produce  a  set  of  conditions  that  are 
referred  to  as  lateral  inhibition.  What  this  means  is  that  one  of  the  chemicals,  usually 
called  the  activator,  has  a  low  diffusion  coefficient  and  exerts  its  influence  over  a 
fairly  short  range  while  the  other,  called  the  inhibitor,  has  a  much  higher  diffusion 
coefficient  and  thus  exerts  its  effect  over  a  much  longer  range.  The  term  is  derived 
from  physiology  where  similar  short-range  activation,  long-range  inhibition  systems 
are  common,  and  particularly  well  studied  in  the  retina  where  lateral  inhibition  is 
in  part  responsible  for  the  detection  of  edges  and  patterns. 

Three  reaction-diffusion  systems  have  achieved  particular  popularity  for  prob¬ 
lems  in  developmental  biology  and  biological  pattern  formation.  The  model  of 
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Schnakenberg““  is  one  of  the  simplest  systems  that  exhibits  chemical  pattern  for¬ 
mation.  Its  reaction  dynamics  are  given  by: 

F{A.  B)  -  koA  +  k^A-B, 

G(A,B)  =k4-  ksA-B. 

The  lateral  inhibition  system  of  Meinhardt'“  is  the  one  whose  behavior  has  been 
studied  most  extensively: 


F{A.B)=k,-k,A  +  ^, 

G(A,B)  =k^A-  -  k^B. 

The  reaction  system  of  Thomas. while  more  complicated  than  the  preceding  two. 
has  the  virtue  that  it  is  the  only  system  that  is  empirical,  based  on  real  chemistry. 
It  involves  three  reactants  as  follows: 

F(A,B)  =/ti  -  koA  -  H{A.B), 

G(A.B)=k2-k^B-  H{A,B), 

H{A,B)  =- - 

+  kjA  +  kiiA~ 

For  many  purposes  it  is  convenient  to  express  equations  such  as  these  in  a 
nondimensional  form.  One  reason  is  that  nondimensionalization  always  reduces  the 
number  of  parameters  in  the  model,  which  simplifies  the  analysis  of  the  scope  of  the 
model.  Another  is  that  it  removes  the  units  of  measurement  and  thus  allows  one  to 
examine  the  effects  of  scale  more  effectively.^'^  Murray^'*  suggests  the  following 
general  nondimensional  form  for  reaction-diffusion  systems: 

ut  =7f{u,v)  +  V-u, 

■> 

Vt  ='yg{u,  v)  -f  dV^v. 

With  the  appropriate  scaling,  the  reaction  dynamics  for  the  three  systems  men¬ 
tioned  above  can  be  rewritten  as  follows: 

f{u,v)  ={n  -  u  +  u~v) 

•>  ( '  > 
g{u,  lO  =(6  -  u~v) 


for  the  Schnakenberg  system; 


168 


H.  F.  Nijhout 


for  the  Meinhardt  system;  and 


f{u,v)  —a  —  u  —  h(u.  v) 
g(u,v)  =a(6  —  v)  —  h{u.  v) 


/i(u,  v) 


puv 

1  +  u  +  l\u- 


(9) 


for  the  Thomas  system. 

The  parameter  d  in  Eq.  (6)  corresponds  to  the  ratio  of  the  diffusion  coeffi¬ 
cients  of  inhibitor  and  activator,  while  the  parameter  7  represents  the  scale  of  the 
system.  Murray^'*  suggests  that  7  is  proportional  to  .he  area  of  the  system,  for 
two-dimensional  diffusion.  7  can  also  represent  the  strength  of  the  reaction  term 
relative  to  the  diffusion  term.  An  increase  in  7  can  be  offset  by  a  decrease  in  d  The 
advantage  of  having  a  single  variable  that  can  represent  the  scale  of  the  system 
is  that  the  consequences  of  pattern  formation  in  a  growing  system  can  be  easily 
studied,  and  predictions  can  be  made  about  the  differences  in  pattern  that  would 
be  produced  when  the  saume  mechanism  acts  in  developmental  fields  of  different 
sizes.  Both  features  are  of  interest  to  developmental  biologists  who  perforce  deal 
with  many  systems  that  undergo  growth  during  the  period  of  study. 

The  advantages  of  nandimensionalized  systems  in  facilitating  studies  on  the 
effects  of  scaling  are  offset,  for  the  biologist  at  least,  by  the  fact  that  other  biolog¬ 
ically  important  parameters  (such  as  the  reaction  constants  for  the  synthesis  and 
breakdown  of  specific  chemical  species)  become  inaccessible  to  manipulation.  Since 
such  reaction  constants  provide  the  only  direct  link  to  the  genome  (genes  code 
for  enzymes,  whose  activity  is  represented  by  the  reaction  constant),  it  becomes 
virtually  impossible  to  study  the  effects  of  single  gene  alterations.  Thus  biologists 
interested  in  exploring  the  potential  of  gradualistic  accumulations  of  small  genetic 
changes  to  cause  gradualistic  (or  discontinuous)  morphological  change  will  need  to 
work  with  fully  dimensional  forms  of  a  system. 

In  addition  to  the  general  conditions  for  chemical  pattern  formation  mentioned 
above,  there  are  several  specific  conditions  that  must  be  rr^t.  These  are  treated  in 
detail  and  with  several  examples  by  Segel  and  Jackson, Edelstein-Keshet.'’  and 
Murray.^"*  Only  the  summary  conclusions  will  be  given  here.  The  form  of  the  null 
dines  (the  graphs  of  dx/dt  =  0)  of  the  reactions  gives  essential  information  on 
whether  diffusive  instability  is  in  principle  possible.  The  character  of  the  crossover 
point  of  the  two  null  dines  (the  system’s  steady  state)  is  critical:  the  activator  and 
inhibitor  must  both  have  positive  slopes  or  both  have  negative  slopes  at  steady 
state,  and,  in  either  case,  the  slope  of  the  inhibitor  must  be  steeper  than  that  of 
the  activator.®  The  null  dines  for  the  nondimensional  forms  of  the  three  reaction- 
diffusion  equations  listed  above  are  given  in  Murray. 

Whether  or  not  a  system  with  null  dines  of  the  required  shapes  will  exhibit 
diffusive  instability  depends  critically  on  the  values  of  the  parameters,  and  these 
are  specific  to  each  system.  Murray^^  has  worked  out  the  parameter  space  for  the 
nondimensionalized  forms  of  the  three  reaction  systems  listed  above,  and  has  shown 
that  they  are  surprisingly  narrow.  In  almost  all  cases  parameters  must  be  chosen 


Pattern  Formation  in  Biological  Systems 


169 


with  considerable  precision  and  the  choice  of  one  parameter  value  places  significant 
constraints  on  the  possible  values  of  the  remaining  parameters.  Once  the  p2urameter 
space  for  a  given  system  is  known,  however,  it  can  form  the  basis  for  the  numerical 
exploration  of  its  pattern-forming  properties. 

The  analysis  of  the  general  behavior  of  a  reaction-diffusion  system  is  not  a  triv¬ 
ial  matter.  Because  reaction-diffusion  systems  involve  coupled  nonlinear  equations, 
they  usually  cannot  be  solved  analytically  and  their  behavior  must  be  studied  by 
numerical  simulation.  It  is,  however,  possible  to  get  a  general  idea  of  how  a  partic¬ 
ular  system  behaves  by  studying  perturbations  near  the  steady  state  of  a  linearized 
system  (see  Edelstein-Keshei'"  for  the  description  of  a  method).  Such  a  linear  theory 
approach  can  predict  the  number  of  modes  that  will  form  after  random  perturba¬ 
tion  of  a  field  of  given  dimensions.  Arcuri  and  Murray^  ha/e  used  linear  theory 
to  predict  the  pattern  generated  by  the  nondimensionalized  Thomas  system  in  one 
space  dimension  (Figure  3(a)).  The  theory  predicts  a  regular  increase  in  the  number 
of  modes  as  either  d  or  7  increases  (Figure  3(a)).  N'umericcd  simulation  of  the  full 
nondimensional  Thomas  system,  however,  gives  somewhat  different  results  (Figure 
3(b)).  Odd  modes  appear  to  be  favored  over  even  modes,  something  which  linear 
theory  does  not  predict.  The  solution  space  for  modes  2  and  6  is  particularly  small 
in  the  full  nonlinear  system. 

Arcuri  and  Murray^  aiso  calculated  how  the  modes  of  a  Thomas  system  would 
behave  in  a  growing  field.  As  the  field  grow®,  it  can  support  a  progressively  larger 
number  of  modes.  Existing  modes  appear  initially  to  split  in  two.  which  would 
result  in  a  doubling  of  the  number  of  modes.  But  this  is  not  what  happens.  Instead, 
at  a  critical  point  the  system  appears  to  become  unstable  and  reorganizes  so  that 
only  a  single  mode  is  added.  Only  in  some  oases  is  more  than  one  mode  added  as 
the  field  grows,  which  is  consistent  with  the  behavior  shown  in  Figure  3(b),  where 
many  odd  numbered  modes  are  adjacent. 

In  two  dimensions  the  succession  of  modes  is  more  complicated,  and  is  critically 
dependent  on  both  the  chemistry  of  the  system  and  the  geometry  of  the  field.  In 
a  rectcuigular  field  two  perpendicular  sets  of  waves  are  possible,  and  as  the  scale 
of  the  field  increases,  more  waves  can  be  fitted  along  both  axes.  The  succession 
of  modes  for  a  general  leaction-diffusion  system  on  a  rectangular  field  with  no¬ 
flux  boundaries  has  been  studied  by  Edelstein-Keshet,®  and  is  shown  in  Figure 
4  for  a  field  of  dimension  1  x  2.  The  quantity  in  Figure  4  has  the  following 
correspondence; 

-I-  n“/7"  (10) 

where  m  and  n  are  integers  that  represent  the  number  of  wavelengths  parallel  to 
the  X  and  i  xis,  respectively,  and  7  is  the  dimension  of  the  field  parallel  to  the 
y  axis  divia.  by  the  dimension  parallel  to  the  x  axis.  The  succession  of  modes 
is  then  given  by  the  sequence  of  values  of  m  and  n,  ranked  in  order  of  increasing 
values  of  E^.^  Figure  5  illustrates  the  patterns  that  correspond  to  several  of  these 
modes.  In  a  real  system  E^  can  be  derived  as  a  function  of  the  area  of  the  field  and 
the  ranges  of  activation  and  inhibition,  and  thus  the  succession  of  modes  shown  in 
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FIGURE  3  Solution  space  for  the  nonc;mensionalized  Thomas^®  system  (Eq.  (9))  in 
one  spatial  dimension,  with  '•o-flux  boundary  conditions,  (a)  Modes  for  various  values 
of  d  and  7  obtained  from  linear  theory,  (b)  Solution  space  obtained  by  simulation  of  the 
full  non-linear  system  (c)  Spatial  distribution  of  morphogen  concentration  for  several 
of  the  regions  indicated  in  (b).  Reprinted  by  permission  of  the  publisher  from  "Pattern 
Sensitivity  to  Boundary  and  Initial  Conditions  in  Reaction-Diffusion  Models’  by  P.  Arcuri 
and  J.  D.  Murray.  J.  Math.  Biol.  24.  141-165.  Copyright  ©  1986  by  Springer-Verlag. 
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FIGURE  4  Progression  of  modes  of  a  typical 
reaction-diffusion  system  in  two  dimensions, 
with  increasing  values  of  E~.  The  modes  m 
and  n  correspond  to  the  number  of  wave 
peaks  supported  in  the  x  and  y  direction, 
respectively  as  the  domain  size  increases, 
or  as  the  range  of  the  inhibitor  decreases. 
Reprinted  by  permission  of  the  publisher 
from  Mathematical  Models  in  Biology  by 
L.  Edelstein-Keshet.  Copyright  (c)  1988  by 
Random  House  Publishers. 


Figures  4  and  5  can  be  the  consequence  of  gradual  changes  in  any  of  these  three 
parameters.  Of  course,  fields  of  different  shapes  may  have  a  different  succession  of 
modes,  determined  also  by  the  value  of  7. 
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FIGURE  5  Examples  of  the  first  seven  two-dimensional  patterns  predicted  for  a  typical 
reaction-diffusion  system  under  the  conditions  described  in  Figure  4.  Reprinted  by 
permission  of  the  publisher  from  Mathematical  Models  in  Biology  by  L.  Edelstein- 
Keshet.  Copyright  ©  1988  by  Random  House  Publishers. 


In  circular  and  elliptical  fields,  the  succession  of  modes  is  also  different. 
Kauffman*®  has  calculated  the  mode  progression  for  a  general  reaction-diffusion 
system  on  an  elliptical  domain  of  increasing  size  and  showed  that  the  succession 
of  nodal  lines  on  such  a  growing  field  was  very  similar  to  the  succession  of  com- 
partmental  boundaries  that  form  in  the  wing  imaginal  disks  of  Drosophila.  It  may 
therefore  be  that  the  progressive  compartmentalization  of  the  Drosophila  imagi¬ 
nal  disks  is  the  simple  and  spontaneous  consequence  of  a  reaction-diffusion  system 


Pattern  Formation  in  Biological  Systems 


173 


operating  on  a  growing  domain.  It  is  interesting  to  modes  that  the  succession  of 
modes  is  also  similar  to  the  succession  of  nodes  in  vibrating  circular  and  elliptical 
plates.  Xu  et  al.^®  have  shown  that  the  vibrational  modes  of  plates  of  more  comple.x 
shapes  also  corresponds  generally  to  the  pattern  boundaries  produced  by  reaction- 
diffusion  systems  on  similar-shaped  fields.  Murray^"*  has  noted  that  the  initial  stages 
of  chemical  pattern  formation  by  reaction-diffusion  poses  the  same  mathematical 
eigenvalue  problem  as  that  describing  the  vibration  of  thin  plates.  Thus,  assuming 
equivalent  boundary  conditions  can  be  established  on  vibrating  plates,  we  may  have 
here  an  analog  model  of  pattern  formation  by  a  general  reaction-diffusion  system 
that  solves  for  the  pattern  almost  instantaneously,  and  would  therefore  afford  a  fast 
and  efficient  way  of  exploring  patterning  in  complex  geometries. 


A  DISCRETE  MODEL  OF  PATTERN  FORMATION  BY  LATERAL 
INHIBITION 

Young^®  has  demonstrated  that  instead  of  using  continuous  partial  differential  equa¬ 
tions  to  describe  pattern  formation  by  reaction  diffusion,  it  is  possible  to  obtain 
equivalent  results  with  a'completely  discrete  model  that  captures  the  essence  of 
lateral  inhibition  but  does  not  require  solution  of  the  diffusion  equation.  Young  s 
theory  is  modeled  on  the  one  proposed  by  Swindale*^  for  explaining  patterns  in  the 
visual  cortex  of  the  brain. 

Young^®  models  the  combined  effect  of  a  short-range  activator  and  a  long- 
range  inhibitor  by  assuming  that  around  each  "source”  cell  there  are  two  concentric 
circular  regions;  an  inner  one  where  there  is  a  constant  positive  value  of  some 
control  parameter,  and  an  outer  one  where  there  is  a  constant  but  negative  value 
of  the  same  parameter  (Figure  6).  This  condition  corresponds  to  the  short-range 
activation  and  long-range  inhibition  of  a  lateral  inhibition  model,  the  principal 
difference  being  that  in  reaction-diffusion  systems  the  "activity”  of  the  activator 
and  inhibitor  decline  gradually  with  distance  from  the  center  of  activation. 

The  Young  mechanism  produces  spots  or  irregular  stripes,  depending  on  the 
ratio  of  activator  and  inhibitor  levels  (Figure  7).  Small  values  produce  spots,  while 
large  values  of  the  ratio  produce  stripes.  The  size  of  these  pattern  elements  is 
determined  by  the  range  of  the  activator,  while  their  spacing  is  determined  in  large 
measure  by  the  range  of  the  inhibition.  One  of  the  chief  advantages  of  the  Young 
method  is  that  the  patterns  form  and  stabilize  after  only  three  or  four  iterations. 
This  mechanism  can  therefore  produce  patterns  far  more  rapidly  than  one  that 
depends  on  the  numerical  simulation  of  diffusion  (which  may  require  thousands  of 
iterations). 
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FIGURE  6  Discrete 
lateral-inhibition  model  of 
Young.^®  (a)  The  typical 
lateral  inhibition  system 
with  continuously  variable 
values  of  activator  and 
inhibitor,  (b)  Young’s 
model  with  discrete  and 
spatially  constant  values 
for  activator  and  inhibitor. 
Each  differentiated  cell 
exerts  a  constant  short- 
range  activating  effect 
{wi)  and  a  constant  long- 
range  inhibitory  effect 
(w2)  on  its  neighbors. 
Reprinted  by  permission 
of  the  publisher  from  “A 
Local  Activator-Inhibitor 
Model  of  Vertebrate  Skin 
Patterns"  by  D.  A.  Young, 
Math.  Bxosci.  72,  51- 
58.  Copyright  ©  1984  by 
Elsevier  Science  Publishing 
Co.,  Inc. 
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FIGURE  7  Patterns  produced 
by  the  Young  model  for 
different  values  of  W2  (while 
W]  is  held  constant  at  1 ),  after 
random  activation  of  some 
cells.  Reprinted  by  permission 
of  the  publisher  from  “A  Local 
Activator-Inhibitor  Model  of 
Vertebrate  Skin  Patterns’  by 
D.  A.  Young,  Math.  Biosct. 
72,  51-58.  Copyright  ©  1984 
by  Elsevier  Science  Publishing 
Co.,  Inc. 
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RANDOM  AND  NON-RANDOM  PATTERNS 

The  patterns  produced  by  the  Young  mechanism  (Figure  7)  illustrate  one  of  the 
limitations  of  the  standard  approach  to  the  simulation  of  pattern  formation.  When 
patterning  is  initiated  by  random  perturbation  of  the  steady  state  (as  is  usually  done 
to  study  the  general  properties  of  a  given  reaction-diffusion  mechanism),  then  the 
pattern  produced  is  also  random.  These  patterns  thus  mimic  the  stripes  on  the  coats 
of  zebras,  or  the  spotting  patterns  of  cheetahs,  leopards,  and  giraffes,  all  of  which  are 
random  and  characterize  the  individual  like  fingerprints.  Randomness  is.  in  fact,  the 
hallmark  of  vertebrate  color  patterns,  and  of  certain  developmental  patterns  such 
as  the  interdigitating  ocular  dominance  stripes  in  the  vertebrate  visual  cortex.  The 
vast  majority  of  patterns  in  development,  however,  are  regular  and  are  reproduced 
identically  from  individual  to  individual.  To  obtain  regularity  and  repeatability,  it  is 
necessary  to  define  the  boundary  conditions  and  initial  conditions  by  a  non-random 
mechanism.  The  trick  in  modeling  pattern  formation  in  development  is  to  find  a  non- 
arbitrary  means  of  defining  initial  and  boundary  conditions.  This  generally  requires 
substantial  knowledge  of  the  developmental  biology  of  the  system  under  study. 
Thus,  while  reaction-diffusion  mechanisms  can  make  patterns  that  look  remarkably 
like  those  seen  in  nature.. we  can  only  accept  a  given  pattern  and  mechanism  as 
representing  nature  in  a  significant  and  meaningful  way  if  it  is  back  m  by  a  body 
of  experimental  evidence  that  gives  us  confidence  that  we  have  appi.  le  correct 
boundary  conditions. 


RESULTS  OF  SIMULATIONS  IN  TWO  DIMENSIONS 

We  use  numerical  simulation  methods  to  illustrate  some  of  the  differences  between 
the  three  reaction-diffusion  schemes  discussed  above.  The  field  dimensions  and 
boundary  conditions  used  in  these  examples  were  chosen  because  they  define  a 
problem  of  biological  interest,  namely  the  formation  of  butterfly  wing  patterns. 
We  will  first,  however,  examine  the  behavior  of  the  models  before  illustrating  their 
application  to  a  biological  problem. 

Figures  8,  9,  and  10  illustrate  the  behaviors,  respectively,  of  the  nondimension- 
alized  Schnakenberg,  Thomas,  and  Meinhardt  systems  subject  to  the  same  initial 
and  boundary  conditions.  The  field  is  a  (1  x  2)  rectangle,  with  fixed  boundaries 
on  one  short  side  and  the  two  long  sides,  and  no-flux  conditions  at  the  remaining 
short  side.  Initial  conditions  were  the  unperturbed  steady  state.  The  figures  show 
the  near  steady  state  concentration  of  the  activator  that  develops  after  setting  the 
fixed  boundaries  to  1.1  times  the  initial  steady  state.  Each  panel  explores  the  d/7 
parameter  space.  It  will  be  recalled  that  an  increase  in  the  parameter  7  can  be 
interpreted  as  an  increase  in  the  size  of  the  field,  while  an  increase  in  parameter 
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FIGURE  9  Patterns 
produced  for  various 
values  of  d  and  7  by  the 
Meinhardt  system  (Eq.  (8)) 
in  two  spatial  dimensions. 


d  represents  an  increase  in  the  rainge  of  the  inhibitor.  The  patterns  produced  by 
fixed  boundary  conditions  on  all  four  sides  can  be  visualized  by  reflection  on  the 
horizontal  midline  of  each  figure. 

It  is  obvious  that  the  three  mechanisms  produce  dramatically  different  patterns. 
The  Thomas  and  Schnakenberg  systems  produce  mostly  linear  patterns,  while  the 
Meinhardt  mechanism  stabilizes  as  point  patterns.  The  patterns  produced  by  the 
Thomas  and  Schnakenberg  systems  differ  considerably  in  detail.  The  Thomas  pat¬ 
terns  are  relatively  simple  lines,  while  the  Schnakenberg  patterns  tend  to  develop 
bulges  and  isolated  islands  of  activator  concentration.  It  is  possible  to  get  an  idea  of 
the  sensitivity  of  these  systems  to  variation  in  parameters  and  field  size  by  noting 
the  changes  in  pattern  that  are  associated  with,  say,  a  10%  change  in  d  or  7.  On 
the  whole,  variation  of  this  magnitude  has  relatively  little  effect  on  the  pattern. 

It  is  evident  that  the  three  reaction-diffusion  systems  are  far  from  equivalent, 
even  though  linear  theory  predicts  the  same  general  behavior  for  all  three  systems. 
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FIGURE  1 0  Patterns 
produced  for  various  values 
of  d  and  7  by  the  Thomas 
system  (Eq.  (9))  in  two 
spatial  dimensions. 


The  details  of  the  patterns  proaaced  by  each,  and  the  characteristic  differences 
between  them,  can  only  be  uncovered  by  simulation.  This  means  that  there  is  no 
way  of  using  the  information  in  Figures  8  to  10  to  predict  how  these  three  systems 
will  behave  under  different  boundary  conditions.  We  can  be  assured  that  each  will 
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produce  characteristically  different  patterns,  but  their  form  cannot  be  predicted 
without  simulation. 

Both  the  one-dimensional  simulations  of  Arcuri  and  Murray^  and  the  two- 
dimensional  simulations  shown  above  illustrate  that  the  full  nonlinear  systems  pro¬ 
duce  patterns  whose  details  differ  significantly  (and  often  dramatically)  from  those 
predicted  by  linear  theory.  For  most  developmental  systems  the  details  of  the  pat¬ 
tern  are  more  important  than  its  general  features,  and  this  means  that  each  biolog¬ 
ical  problem  in  which  reaction  diffusion  is  believed  to  play  a  role  must  be  studied 
by  full  simulation  of  the  nonlinear  system. 


CELLULAR  AUTOMATA 

We  conclude  the  general  section  on  pattern  formation  with  a  brief  discussion  of  the 
usefulness  of  cellular  automata  for  simulating  pattern  formation  in  development. 
In  their  pure  form,  cellular  automata  are  points  in  space  which  can  take  on  one  of 
two  values  (0  or  1)  depending  on  the  values  of  other  such  points  in  their  neighbor¬ 
hood.  The  rules  of  a  cellular  automaton  determine  how  the  values  of  neighbors  are 
interpreted.  With  relatively  simple  rules  operating  on  such  binary  automata,  it  is 
possible  to  produce  a  vast  array  of  complicated  patterns  that  have  fascinated  math¬ 
ematicians  and  biologists  for  nearly  a  decade  (e.g.,  Wolfram^).  Such  automata  have 
been  used,  among  others,  to  simulate  the  color  patterns  on  mollusk  shells,  and  the 
branching  pattens  of  algae.  Spiral  waves,  such  as  those  of  the  Belousov-Zhabotinski 
reaction,  and  interdigitating  patterns,  resembling  ocular  dominance  stripes,  are 
particularly  easy  to  mimic  and  emerge  from  a  variety  of  automata. 

Cellular  automata  are  attractive  for  biological  simulation  because  they  evoke  an 
immediate  image  of  biological  cells,  each  with  a  fairly  simple  repertoire  of  behaviors, 
but  collectively  capable  of  complex  morphogenesis.^^  Cellular  automata  can  serve 
as  models  of  biological  pattern-formation  systems  because  biological  cells,  too,  be¬ 
have  by  interacting  only  with  their  immediate  neighbors,  while  obeying  some  set  of 
internal  “rules.”  The  complex  patterns  that  appear  during  development  are  emer¬ 
gent  properties  of  the  interaction  of  those  rules  with  their  cellular  and  chemical 
environment.^^  Many  theoretical  biologists  are,  however,  reluctant  to  accept  cellu¬ 
lar  automata  models  because  the  formal  rules  are  difficult  to  analogize  to  known 
biological  processes,  and  because  there  exist  as  yet  no  general  methods  for  translat¬ 
ing  biological  interactions  into  a  table  of  local  rules.  Thus,  while  cellular  automata 
can  produce  biologically  realistic  patterns,  they  often  offer  little  insight  into  the 
biological  process.  In  other  words,  getting  the  right  pattern  is  of  no  use,  if  it  is 
obtained  for  the  wrong  reaison  (a  caveat  that  applies,  obviously,  to  all  theoretical 
modeling  in  biology). 

Cellular  automata  can,  however,  be  easily  extended  to  increase  their  biological 
reaJism.  Each  point  (or  cell)  in  an  zirray  can  be  assumed  to  take  on  a  continuous 
range  of  values,  and  can  possess  values  in  more  than  one  variable.  The  rules  by 
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which  these  values  change  can  reflect  the  interactions  between  cells,  such  as  recep¬ 
tor  binding,  competition,  or  diffusion,  and  any  number  of  biochemical  reactions. 
Clearly,  with  such  extensions  cellular  automata  begin  to  resemble  the  methods 
used  for  numerical  simulation.  The  main  difference  is  that  cellular  automata  do 
not  attempt  to  model  a  differential  equation  (though  they  may).  Such  complex 
automata  are  useful  for  biologists  because  they  can  directly  model  communication 
between  cells,  and  they  allow  exarrunation  of  the  consequences  of  qualitative  cind 
quantitative  rules  of  interaction. 


SIMULATION  AND  MIMICRY 

Cellular  automata,  like  reaction-diffusion  systems,  are  useful  only  to  the  extent 
that  they  give  insights  into  the  biology  of  the  system  that  is  being  simulated. 
In  this  regard  it  is  perhaps  useful  to  make  a  distinction  between  simulation  and 
mimicry.  In  simulation  the  theoretical  model  grasps  and  accurately  summarizes  the 
principles  behind  the  process  being  simulated,  while  in  mimicry  the  model  is  wrong 
even  though  it  produces  the  right  kind  of  pattern.  Mimicry  in  theoreticaJ  modeling 
commits  what  statisticians  would  call  a  type  2  error;  accepting  a  false  hypothesis, 
or  in  this  case,  getting  the  right  answer  for  the  wrong  reason. 

Unfortunately,  much  modeling  in  theoretical  developmental  biology  appears  at 
present  to  be  mimicry.  In  development ed  modeling  it  is  easy  to  get  the  right  kinds 
of  pattern  for  the  wrong  reason  because  certain  categories  of  biologically  reasonable 
patterns  (zebra  stripes,  ocular  dominance  stripes,  sea  shell  patterns)  emerge  readily 
from  a  variety  of  reaction-diffusion  and  cellular  automata  models.  In  most  modeled 
systems,  we  simply  do  not  know  enough  about  the  developmental  physiology  to 
make  sensible  choices  between  alternative  models,  and,  even  when  we  can  imagine 
only  one  model  mechanism,  we  cannot  be  sure  it  has  captured  the  essence  of  the 
underlying  process. 

In  order  for  a  model  to  be  biologically  useful,  it  must  obviously  incorporate  as 
much  information  as  possible  abcuU  the  developmental  physiology  of  the  system. 
But  that  is  generally  not  sufficient.  In  order  to  have  reasonable  assurance  that 
a  model  has  captured  the  essence  of  a  process,  it  must  produce  a  pattern  whose 
details  resemble  those  of  the  morphology  being  modeled,  it  must  also  reproduce  in 
its  dynamics  reasonable  portions  of  the  ontogenetic  transformation  that  the  real 
pattern  undergoes,  and  because  morphological  evolution  is  gradualistic,  it  must  be 
able  to  produce  by  simple  changes  of  parameter  values  (and  not  by  adding  more 
terms  to  the  model)  a  range  of  diversity  of  the  pattern  identical  to  that  found  to 
occur  in  nature.  Few  models  meet  these  expectations. 
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PATTERN  FORMATION  ON  BUTTERFLY  WINGS 

Here  we  briefly  discuss  pattern  formation  on  the  wings  of  butterflies  as  a  concrete 
example  of  color  pattern  formation  because  it  is  one  of  the  few  systems  that  meets 
the  expectations  of  physiology,  detail,  ontogeny,  and  diversity,  mentioned  above.  It 
has  the  added  advantage  that  the  patterns  are  strictly  two  dimensional,  exhibit  an 
evolved  system  of  homologous  elements  with  transformations  across  the  thousands 
of  species  of  butterflies,  and  can  be  easily  modeled  without  having  to  collapse  any 
dimensions.  This  system  has  provided  a  variety  of  insights  into  the  way  in  which 
developmental  processes  change  during  morphological  evolution. 


FIGURE  11  The 
nymphalid  ground  plan. 
This  is  a  diagrammatic 
representation  of  the 
general  distribution  of 
pattern  elements  (labeled 
b-j)  on  the  wings  of 
butterflies.  The  pattern 
elements  are  arranged  in 
serially  homologous  series 
that  repeat  from  wing 
cell  to  wing  cell.  (From 
Nijhout^*^;  reprinted  by 
permission  of  the  author.) 
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FIGURE  12  Hypothetical  evolution  of  the  nymphalid  ground  plan  from  ancestors  with  a 
few  simple  uncompartmentalized  symmetry  systems. 


The  color  patterns  of  butterflies  ^ire  all  variants  on  a  theme  of  homologies  called 
the  nymphalid  ground  plan  (Figure  11).  ihe  entire  diversity  of  color  patterns  comes 
about  through  the  selective  expression  and  modification  of  the  individual  pattern 
elements  that  make  up  the  ground  plan.  The  wing  pattern  is  compartmentalized 
into  two  developmentally  independent  systems.  First,  the  overall  pattern  is  divided 
into  three  parallel  symmetry  systems:  the  basal  symmetry  system  (elements  b  and 
c,  in  Figure  11),  the  central  symmetry  system  (elements  d  and  f),  and  the  border 
symmetry  system  (elements  g  and  i).  In  the  centers  of  the  latter  two  systems,  there 
are  two  additional  pattern  elements,  the  discal  spot  (element  e)  and  the  border 
ocelli  (element  h).  Secondly,  the  development  of  the  elements  of  these  symmetry 
systems  within  a  given  wing  cell  is  uncoupled  from  that  in  adjoining  wing  cells.  As 
a  consequence  of  this  developmental  isolation,  each  element  of  the  pattern  has  been 
free  to  evolve  morphologically  with  nearly  complete  independence  from  the  other 
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pattern  elements.  The  overall  wing  pattern  is  thus  a  mosaic  of  semi-independent 
pattern  elements  that  can  be  modified  and  arranged  on  the  wing  surface  to  provide 
a  variety  of  optical  effects,  ranging  from  camouHage  to  mimicry. 

The  presumptive  evolution  of  the  nymphalid  ground  plan  is  illustrated  dia- 
grammatically  in  Figure  12.  The  ancestor  is  believed  to  have  had  a  simple  pattern 
with  a  single  symmetry  system,  as  is  found  in  many  species  of  moths  today.  Evolu¬ 
tion  of  complexity  progressed  by  the  addition  of  more  symmetry  systems  (Figures 
12(b)-(d)),  possibly  by  a  system  that  sets  up  an  increasing  number  of  standing 
waves  on  tlie  wing.  The  number  of  symmetry  systems  became  stabilized  at  three, 
and  each  gradually  evolved  a  distinctive  morphology  (Figures  12(e)-(h)),  probably 
due  to  the  evolution  of  a  proximo-distal  gradient  or  di‘"ontinuity  in  some  vau^iables 
that  interact  with  the  wavp  pattern.  In  the  immediai  .acestors  of  the  butterflies, 
the  wing  veins  became  ooundaries  to  pattern  formation  and  the  pattern  became 
compartmentalized  to  each  wing  cell  (Figures  12(f)-(h)).  With  this  developmental 
isolation  the  pattern  elements  in  each  wing  cell  became  free  to  diverge  both  in 
position  (Figures  12(f)  and  (g,))  and  morpnclogy  (Figures  12(g)  and  (h)). 

The  developmental  compartmentalization  of  the  wing  pattern  greatly  facilitates 
its  modeling,  because  each  pattern  element  in  each  wing  cell  can  be  modeled  sep¬ 
arately  without  having  to  worry  about  possible  int'^ractions  with  distant  patterns. 
Nijhout^®’^*^  has  shown  that  a  relatively  simple  model  can  account  for  nearly  the 
entire  diversity  of  shapes  of  pattern  elements  that  are  found  among  the  thousands 
of  species  of  butterflies.  The  model  generates  the  pattern  in  two  steps,  in  accordcince 
with  what  is  known  about  the  developmental  physiology  of  pattern  formation  in 
this  system.  The  first  step  establishes  a  system  of  line  and  point  sources  of  a  dif¬ 
fusible  substance,  and  the  second  step  establishes  the  pattern  as  a  simple  threshold 
on  the  diffusion  gradients  produced  by  those  sources. 


Point  Sources  Lint  Sources 


FIGURE  13  Distribution  of  sources 
(or  sinks)  that  can  produce  nearly 
the  entire  diversity  of  patterns  found 
in  the  wing  ceils  of  butterflies.  The 
rectangular  field  represents  a  single 
wing  ceil  in  which  vein  make  up  the 
two  long  side  boundary  and  the  top 
boundary.  (From  Nijhout^*^;  reprinted 
by  permission  of  the  author.) 
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FIGURE  14  The  lateral  inhibition  model  of  Meinhardt  (Eq.  (4))  can  produce  the 
diversity  of  source  distribution  shown  in  Figure  14  by  varying  boundary  conditions 
and  reaction  constants.  The  series  shown  is  a  typical  time  sequence  of  activator 
concentration  which  gradually  transforms  from  a  high  ridge  to  a  series  of  point  sources 
on  the  wing-cell  midline.  (From  Nijhout^^;  reprinted  by  permission  of  the  £Ujthor.) 


The  distribution  of  diffusion  sources  (and  barriers  to  diffusion)  in  real  butterfly 
wings  is  known  from  experimented  perturbation  studies  and  from  studies  of  the 
comparative  morphology  of  normal  and  aberrant  patterns.^®  When  activated  singly 
or  in  pairs,  this  distribution  of  sources  (Figure  13)  has  been  shown,  by  simulation, 
to  be  capable  of  producing  nearly  the  entire  diversity  of  pattern  shapes  found  in 
the  butterflies. 
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Sources  in  the  exact  locations  shown  in  Figure  13,  are  readily  produced  by  the 
Meinhardt^^  lateral  inhibition  system,  and  by  no  other  reaction-diffusion  system 
that  has  been  examined  so  far.^'’  The  Meinhardt  system  produces  the  right  patterns, 
but  only  when  provided  with  fixed  boundary  conditions  for  the  activator  on  three 
of  the  four  sides  of  the  rectangle  that  simulates  a  wing  cell.  These  are  the  three 
locations  of  the  wing  veins  around  a  typical  wing  cell.  The  wing  veins  afford  the 
only  means  by  which  material  can  enter  or  leave  the  developing  wing,  and  provide 
reasonable  physical  constant-level  sources  for  materials,  which  are  modeled  as  fixed 
boundaries. 

Perhaps  the  most  important  feature  of  the  Meinhardt  lateral  inhibition  system 
implemented  in  this  way  is  the  dynamic  progression  of  source  distributions  it  pro¬ 
duces  as  the  reaction-diffusion  progresses  (Figure  14).  This  progression  of  sources 
produces  patterns  that  closely  resemble  the  diversity  of  color  patterns  seen  among 
closely  related  species  in  several  genera  of  butterflies.  Diversity  of  this  type  in 
essence  constitutes  a  heterochrony.  This  example  illustrates  that  the  most  interest¬ 
ing  feature  of  reaction-diffusion  systems,  from  a  biological  perspective,  is  probably 
not  the  steady-state  patterns  to  which  a  system  tends,  but  the  dynamic  progression 
of  patterns  well  before  the  steady  state  is  reached.  Development,  like  most  of  biol¬ 
ogy,  is  not  an  equilibrium  phenomenon.  Dynamically  changing  patterns  like  those 
of  evolving  reaction-diffusion  systems  may  provide  useful  models  for  the  progression 
of  determinative  processes  auring  development. 
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Artificial  Life 


Artificial  Life  complements  the  traditional  analytical  biological  methods 
by  attempting  to  synthesize  lifelike  behaviors  within  computers  and  other 
“‘artificial”  media.  The  primary  motivations  driving  this  synthetic  approach 
are  (1)  to  contribute  to  a  truly  general  theoretical  biology  by  extending  the 
empirical  data  base  beyond  the  carbon-bcised  life  that  has  evolved  "n  the 
planet  Earth,  and  (2)  to  apply  fundamental  principles  of  biological  form 
and  function  to  the  solution  of  hard  problems  in  science  and  engineering. 


1.  THE  BIOLOGY  OF  POSSIBLE  LIFE 

Biology  is  the  scientific  study  of  life — in  principle  anyway.  In  practice,  biology  is  the 
scientific  study  of  life  on  Earth  based  on  carbon-chain  chemistry.  There  is  nothing 
in  its  charter  that  restricts  biology  to  the  study  of  carbon-based  life;  it  is  simply 
that  this  is  the  only  kind  of  life  that  has  been  available  to  study.  Thus,  heoretical 
biology  has  long  faced  the  fundamental  obstacle  that  it  is  impossible  to  derive 
general  principles  from  single  examples. 
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Without  other  examples  it  is  extremely  difficult  to  distinguish  essential  proper¬ 
ties  of  life — properties  that  must  be  shared  by  any  living  system  in  principle — from 
properties  that  may  be  incidental  to  life,  but  which  happen  to  be  universal  to  life 
on  Earth  due  solely  to  a  combination  of  local  historical  accident  and  common  ge¬ 
netic  descent.  Since  it  is  quite  unlikely  that  organisms  based  on  different  physical 
chemistries  will  present  themselves  to  us  for  study  in  the  foreseeable  future,  our 
only  alternative  is  to  try  to  synthesize  alternative  life-forms  ourselves — Artificial 
Life:  life  made  by  man  rather  than  by  nature. 


1.1  ARTIFICIAL  LIFE 

Biology  has  traditionally  started  at  the  top,  viewing  a  living  organism  as  a  complex 
biochemical  machine,  and  has  worked  analytically  down  from  there  through  the 
hierarchy  of  biological  organization — decomposing  a  living  organism  into  organs, 
Lus,  cells,  or^^anelles,  and  finzilly  molecules — in  its  pursuit  of  the  mechanisms 
.  life.  Analysis  means  “the  separation  of  an  intellectual  or  substantial  whole  into 
constituents  for  individual  study.”  By  composing  our  individual  understandings  of 
the  dissected  component  parts  of  living  organisms,  traditional  biology  has  provided 
us  with  a  broad  picture  of  the  mechanics  of  life  on  Earth. 

But  there  is  more  to  life  than  mechanics — there  is  also  dynamics.  Life  de¬ 
pends  critically  on  principles  of  dynamical  self-organization  that  have  remained 
largely  untouched  by  traditional  analytic  methods.  There  is  a  simple  explanation 
for  this — these  self-organizing  dynamics  are  fundamentally  nonlinear  phenomena, 
and  nonlinear  phenomena  in  general  depend  critically  on  the  interactions  between 
parts:  they  necessarily  disappear  when  parts  are  treated  in  isolation  from  one  an¬ 
other,  which  is  the  basis  for  the  analytic  method. 

Rather,  nonlinear  phenomena  are  most  appropriately  treated  by  a  synthetic 
approach.  Synthesis  means  “the  combining  of  separate  elements  or  substances  to 
form  a  coherent  whole.”  In  nonlinear  systems,  the  ptuts  must  be  treated  in  each 
other’s  presence,  rather  than  independently  from  one  another,  because  they  behave 
very  differently  in  each  other’s  presence  than  we  would  expect  from  a  study  of  the 
parts  in  isolation. 

Artificial  Life  is  simply  the  synthetic  approach  to  biology:  rather  than  take  living 
things  apart,  Artificial  Life  attempts  to  put  living  things  together. 

But  Artificial  Life  is  more  than  this.  To  understand  the  overtJl  aims  of  the 
Artificial  Life  enterprise,  one  needs  to  do  the  following.  (1)  Broaden  the  scope  of 
the  attempts,  beyond  simply  recreating  “the  living  state,”  to  the  synthesis  of  any 
and  all  biological  phenomena,  from  viral  self-assembly  to  the  evolution  of  the  entire 
biosphere.  (2)  Couple  this  with  the  observation  that  there  is  no  reason,  in  principle, 
why  the  parts  we  use  in  our  attempts  to  synthesize  these  biological  phenomena  need 
be  restricted  to  carbon-chain  chemistry.  (3)  Note  that  we  expect  the  synthetic  ap¬ 
proach  to  lead  us  not  only  to,  but  quite  often  beyond,  known  biological  phenomena: 
beyond  life-as-we-know-it  into  the  realm  of  life-as-it-could-be. 
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Thus,  for  example,  Artificial  Life  involves  attempts  to  (1)  synthesize  the  process 
of  evolution  (2)  in  computers,  and  (3)  will  be  interested  in  whatever  emerges  from 
the  process,  even  if  the  results  have  no  analogs  in  the  “natural”  world.  It  is  certainly 
of  scientific  interest  to  know  what  kinds  of  things  can  evolve  in  principle,  whether 
or  not  they  happened  to  do  so  here  on  Earth. 

1.2  Al  AND  THE  BEHAVIOR  GENERATION  PROBLEM 

Artificial  Life  is  concerned  with  generating  lifelike  behavior.  Thus,  it  focuses  on  the 
problem  of  creating  behavior  generators.  A  good  place  to  start  is  to  identify  the 
mechanisms  by  which  behavior  is  generated  and  controlled  in  natural  systems,  and 
to  recreate  these  mechanisms  in  artificial  systems.  This  is  the  course  we  will  take 
later  in  this  paper. 

The  related  field  of  Artificial  Intelligence  is  concerned  with  generating  intel¬ 
ligent  behavior.  It,  too,  focuses  on  the  problem  of  creating  behavior  generators. 
However,  although  it  initially  looked  to  natural  intelligence  to  identify  its  underly¬ 
ing  mechanisms,  these  mechanisms  were  not  known,  nor  are  they  today.  Therefore, 
following  an  initial  flirt  with  neural  nets,  AI  became  wedded  to  the  only  other  known 
vehicle  for  the  generation  of  complex  behavior:  the  technology  of  serial  computer 
programming.  As  a  consequence,  from  the  very  beginning  artificial  intelligence  em¬ 
braced  an  underlying  methodology  for  the  generation  of  intelligent  behavior  that 
bore  no  demonstrable  relationship  to  the  method  by  which  intelligence  is  generated 
in  natural  systems.  In  fact,  AI  has  focused  primarily  on  the  production  of  intelligent 
solutions  rather  than  on  the  production  of  intelligent  behavior.  There  is  a  world  of 
difference  between  these  two  possible  foci. 

By  contrast.  Artificial  Life  has  the  great  good  fortune  that  many  of  the  mech¬ 
anisms  by  which  behavior  arises  in  natural  living  systems  are  known.  There  are 
still  many  holes  in  our  knowledge,  but  the  generaJ  picture  m  place.  Therefore, 
Artificial  Life  can  start  by  recapturing  natural  life  and  has  to  resort  to  the 

sort  of  initial  infidelity  that  is  now  coming  bewrk  to  haunt  Al. 

The  key  insight  into  the  natural  method  of  behavior  gene;  is  gained  by 
noting  that  nature  is  fundamentally  parallel.  This  is  reflected  in  tn,  hitecture” 
of  natural  living  organisms,  which  consist  of  many  millions  of  parts.  :ich  one  of 
which  has  its  own  behavioral  repertoire.  Living  systems  are  highly  distributed  and 
quite  massively  parallel.  If  our  models  are  to  be  true  to  life,  they  must  also  be 
highly  distributed  and  quite  massively  parallel.  Indeed,  it  is  unlikely  that  any  other 
approach  will  prove  viable. 


2.  HISTORICAL  ROOTS  OF  ARTIFICIAL  LIFE 

Mankind  has  a  long  history  of  attempting  to  map  the  mechanics  of  his  contemporary 
technology  onto  the  workings  of  nature,  trying  to  understand  the  latter  in  terms  of 
the  former. 
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It  is  not  surprising,  therefore,  that  early  models  of  life  reflected  the  principal 
technology  of  their  era.  The  earliest  models  were  simple  statuettes  and  paintings — 
works  of  art  which  captured  the  static  form  of  living  things.  These  statues  were 
provided  with  articulated  arms  and  legs  in  the  attempt  to  capture  the  dynamnic  form 
of  living  things.  These  simple  statues  incorporated  no  internail  dynamics,  requiring 
human  operators  to  make  them  behave. 

The  earliest  mechanical  devices  that  were  capable  of  generating  their  own 
behavior  were  based  on  the  technology  of  water  transport.  These  were  the  early 
Egyptian  water  clocks  called  Clepsydra.  These  devices  made  use  of  a  rate-limited 
process — in  this  case  the  dripping  of  water  through  a  fixed  orifice — to  indicate  the 
progression  of  another  process — the  position  of  the  sun.  Ctesibius  of  Alexandria 
developed  a  water-powered  mechanical  clock  around  B.C.  135  which  employed  a 
great  deal  of  the  available  hydraulic  technology — including  floats,  a  siphon,  and  a 
water-wheel-driven  train  of  gears. 

In  the  first  century  A.D.,  Hero  of  Alexandria  produced  a  treatise  on  Pneumaitcs, 
which  described,  among  other  things,  various  simple  gadgets  in  the  shape  of  animaJs 
and  humans  that  utilized  pneumatic  principles  to  generate  simple  movements. 

However,  it  was  really  not  until  the  age  of  mechanical  clocks  that  artif£u:ts 
exhibiting  complicated  internal  dynamics  became  possible.  Around  850  A.D.,  the 
mechanical  escapement-was  invented,  which  could  be  used  to  regulate  the  power 
provided  by  falling  weights.  This  invention  ushered  in  the  great  age  of  clockwork 
technology.  Throughout  the  Middle  Ages  and  the  Renaissance,  the  history  of  tech¬ 
nology  is  largely  bound  up  with  the  technology  of  clocks.  Clocks  often  constituted 
the  most  complicated  and  advanced  application  of  the  technology  of  an  era. 

Perhaps  the  earliest  clockwork  simulations  of  life  were  the  so-called  “Jacks," 
mechanical  “men"  incorporated  in  early  clocks  who  would  swing  a  hammer  to  strike 
the  hour  on  a  bell.  The  word  “jack”  is  derived  from  “jaccomarchiadus,”  which  means 
“the  man  in  the  suit  of  armour.”  These  accessory  figures  retained  their  popularity 
even  after  the  spread  of  clock  dials  and  hands — to  the  extent  that  clocks  were 
eventually  developed  in  which  the  function  of  timekeeping  wais  secondary  to  the 
control  of  large  numbers  of  figures  engaged  in  various  activities,  to  the  point  of 
acting  out  entire  plays. 

Finally,  clockwork  mechanisms  appeared  which  had  done  away  altogether  with 
any  pretense  at  timekeeping.  These  “automata”  were  entirely  devoted  to  imparting 
lifelike  motion  to  a  mechanical  figure  or  animal.  These  mechanical  automaton  simu¬ 
lations  of  life  included  such  things  as  elephants,  peacocks,  singing  birds,  musicians, 
and  even  fortune  tellers. 

This  line  of  development  reached  its  peak  in  the  famous  duck  of  Vaucanson, 
described  as  “an  artificial  duck  made  of  gilded  copper  who  drinks,  eats,  quacks, 
splashes  about  on  the  water,  and  digests  his  food  like  a  living  duck.”l'I  Vaucanson 's 
goal  is  captured  neatly  in  the  following  description: 


1*1  All  quotes  concerning  the*"*  m<H-h>»nicaI  ducks  an  from  Chapuis.^ 
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In  1735  Jacques  de  X'aucanson  arrived  in  Paris  at  the  ai^e  of  26.  Under 
the  influence  of  contemporary  philosophic  ideas,  he  had  tried,  it  seems,  to 
reproduce  life  artificially. 

Unfortunately,  neither  t  he  duck  itself  nor  any  technical  descriptions  or  diagrams 
remain  that  would  give  the  details  of  construction  of  this  duck.  The  complexity  of 
the  mechanism  is  attesteil  to  l\v  the  fact  that  one  single  wing  contained  over  tOO 
articulated  pieces. 

One  of  those  called  upon  to  ri'pair  Vaucanson  s  duck  in  later  years  was  a  'me¬ 
chanician"  named  Reichsteiner.  who  was  .so  impre.ssed  with  it  that  he  went  on  to 
build  a  duck  of  his  own — also  now  lost  which  was  exhibited  in  1847.  Here  is  an 
account  of  this  duck's  operation  from  the  newspaper  Das  Freic  Wort: 

.\fter  a  light  touch  on  a  point  on  the  b;use.  the  duck  in  the  most  natural 
way  in  the  world  begins  to  look  around  him.  t'yeing  the  audience  with 
an  intelligent  air.  His  lord  and  master,  however,  apparently  interprets  this 
differently,  for  soon  he  goes  off  to  look  for  something  for  the  bird  to  eat.  .\o 
sooner  has  he  filled  a  dish  with  oatmeal  porridge  than  our  famished  friend 
plunges  his  beak  deep  into  it,  showing  his  satisfaction  by  some  characteristic 
movements  of  his  tail.  The  way  in  which  he  takes  the  porridge  and  swallows 
it  greedily  is  extraordirrarily  true  to  life.  In  next  to  no  time  the  basin  has 
;en  half  emptied,  although  on  several  occasions  the  bird,  as  if  alarmed  by 
some  unfamiliar  noises.  h;us  ^ised  his  head  and  glanced  curiously  .•.r.)und 
him.  After  this,  satisfied  with  his  frugal  meal,  he  stands  up  and  begins 
to  flap  his  wings  and  to  stnUch  himself  while  ('xpressing  his  gratitude  by- 
several  contented  quacks.  But  most  astonishing  of  all  are  t  he  contractions 
of  the  bird's  body  clearly  showing  that  his  stomach  is  a  little  upset  by  this 
rapid  meal  and  the  effects  of  a  painful  digestion  become  obvious.  However, 
the  brave  little  bird  holds  out,  and  after  a  few  moments  we  are  convinced 
in  the  most  concrete  manner  that  he  h;vs  overcome  his  Internal  uilTiculties. 

The  truth  is  that  the  smell  which  now  spreads  through  the  room  becomes 
almost  unbearable.  VVe  wish  to  express  to  the  artist  inventor  the  pleasure 
which  his  demonstration  gave  to  us. 

Figure  1  shows  two  views  of  one  of  the  ducks — there  is  some  controversy  as  to 
whether  it  is  Vaucanson  s  or  Reichsteiner  s.  The  mechanism  inside  the  duck  would 
have  been  completely  covered  wdth  feathers  and  the  controlling  mechanism  in  the 
box  below  would  have  been  covered  up  as  well. 
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FIGURE  1  Two  views  ofjhe  mechanical  duck  attributed  to  Vaucanson.  Printed  in 
Automata:  A  Historical  and  Technological  Study  by  Alfred  Chapuis  and  Edmon  Droz 
(B.  A.  Batsford  Ltd.);  reprinted  by  permission  of  the  publisher. 


2.1  THE  DEVELOPMENT  OF  CONTROL  MECHANISMS 

Out  of  the  technology  of  the  clockwork  regulation  of  automata  came  the  more 
general — and  perhaps  ultimately  more  important — technology  of  process  control. 
As  attested  to  in  the  descriptions  of  the  mechanical  ducks,  some  of  the  clockwork 
mechanisms  had  to  control  remarkably  complicated  actions  on  the  part  of  the  au¬ 
tomata,  not  only  powering  them  but  sequencing  them  as  well. 

Control  mechanisms  evolved  from  early,  simple  devices — such  as  a  lever  at¬ 
tached  to  a  wheel  which  converted  circular  motion  into  linear  motion — to  later, 
more  complicated  devices — such  as  whole  sets  of  cams  upon  which  would  ride 
many  interlinked  mechanical  arms,  giving  rise  to  extremely  complicated  automaton 
behaviors. 

Eventually  programmable  controllers  appeared,  which  incorporated  such  de¬ 
vices  as  interchangeable  cams,  or  drums  with  movable  pegs,  with  which  one  could 
program  arbitrary  sequences  of  actions  on  the  part  of  the  automaton.  The  writ¬ 
ing  and  picture  drawing  automata  of  Figure  2,  built  by  the  Jaquet-Droz  family, 
are  examples  of  programmable  automata.  The  introduction  of  such  programmable 
controllers  was  one  of  the  primary  developments  on  the  road  to  general  purpose 
computers. 


Artificial  Life 


195 


FIGURE  2  Two  views  of  a  drawing  automaton  built  by  the  Jaquet-Droz  family.  Printed 
in  Automata:  .I  Historical  and  Technological  Study  by  Alfreo  Chapuis  and  Edmon 
Droz  (B.  A.  Batsford  Ltd.);  reprinted  by  permission  of  the  publisher. 


2.2  ABSTRACTION  OF  THE  LOGICAL  “FORM”  OF  MACHINES 

During  t  tie  early  [larl  of' t  tie  tweiitiet  li  century,  t  he  formal  appticai  ion  of  logic  to  the 
mechanical  process  of  arithmetic  lead  to  the  abstract  formulation  of  a  'procedure,' 
The  work  of  Church.  Kleene.  Cddel,  Turing,  and  Post  formalized  the  notion  of  a 
logical  sequence  of  steps,  leading  to  the  realization  that  the  essence  of  a  mechan¬ 
ical  process-  the  "thing  responsible*  for  its  dynamic  behavior  is  not  a  thing  at 
all,  but  an  abstract  control  structure,  or  "program”  a  sequence  of  simple  actions 
selected  from  a  finite  repertoire.  Furthermore,  it  was  recognized  that  the  essential 
features  of  this  control  structure  could  be  captured  within  an  abstract  set  of  rules  — 
a  formal  specification  without  regard  to  the  material  out  of  which  the  machine 
was  constructed.  The  "logical  form"  of  a  machine  was  separated  from  its  material 
basis  of  construction,  and  it  was  found  that  "machineness"  was  a  property  of  the 
former,  not  of  the  latter.  Today,  the  formal  equivalent  of  a  "machine"  is  an  algo¬ 
rithm:  the  logic  underlying  the  dynamics  of  an  automaton,  regardless  of  the  details 
of  its  material  construction.  We  now  have  many  formal  methods  for  the  specifica¬ 
tion  and  operation  of  abstract  machines;  such  as  programming  languages,  formal 
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language  theory,  automata  theory,  recursive  function  theory,  etc.  All  of  these  have 
been  shown  to  be  logically  equivalent. 

Once  we  have  learned  to  think  of  machines  in  terms  of  their  abstract,  formal 
specifications,  we  can  turn  around  and  view  abstract,  formal  specifications  as  po¬ 
tential  machines.  In  mapping  the  machines  of  our  common  e.xperience  to  formal 
specifications,  we  have  by  no  means  e.xhausted  the  space  of  possible  specifications. 
Indeed,  most  of  our  individual  machines  map  to  a  very  small  subset  of  the  space  of 
specifications — a  subset  largely  characterized  by  methodical,  boring,  uninteresting 
dynamics. 


2.3  GENERAL  PURPOSE  COMPUTERS 

V'arious  threads  of  technological  development — programmable  controllers,  calculat¬ 
ing  engines,  and  the  formal  theory  of  machines — have  come  together  in  the  general 
purpose,  stored  program  computer.  Programmable  computers  are  extremely  gen¬ 
eral  behavior  generators.  They  have  no  intrinsic  behavior  of  their  own.  Without 
programs,  they  are  like  formless  matter.  They  must  be  told  how  to  behave.  By 
submitting  a  program  to  a  computer — that  is:  by  giving  it  a  formal  specification 
for  a  machine — we  are  telling  it  to  behave  as  if  it  were  the  machine  specified  by 
the  program.  The  comptiter  then  "emulates”  that  more  specific  machine  in  the 
performance  of  the  desired  task.  Its  great  power  lies  in  its  plasticity  of  behavior. 
If  we  can  provide  a  step-by-step  specification  for  a  specific  kind  of  behavior,  the 
chameleon-like  computer  will  exhibit  that  behavior.  Computers  should  be  viewed 
as  second-order  machines — given  the  formal  specification  of  a  first-order  machine, 
they  will  “become”  that  machine.  Thus,  the  space  of  possible  machines  is  directly 
available  for  study,  at  the  cost  of  a  mere  formal  description:  computers  "realize” 
abstract  machines. 


2.4  FORMAL  LIMITS  OF  MACHINE  BEHAVIORS 

Although  computers,  and  by  extension  other  machines,  are  capable  of  exhibiting  a 
bewilderingly  wide  variety  of  behaviors,  we  must  face  two  fundamental  linutations 
on  the  kinds  of  behaviors  that  we  cam  expect  of  computers. 

The  first  limitation  is  one  of  compuiabtltiy  in  principle.  There  are  certain  be¬ 
haviors  that  are  “uncomputable” — behaviors  for  which  no  formal  specification  can 
be  given  for  a  machine  that  will  exhibit  that  behavior.  The  classic  example  of  this 
sort  of  limitation  is  Turing's  famous  Halting  Problem:  can  we  give  a  formal  specifica¬ 
tion  for  a  machine  which,  when  provided  with  the  description  of  any  other  machine 
together  with  its  initial  state,  will — by  inspection  alone — determine  whether  or  not 
that  machine  will  reach  its  halt  state?  Turing  proved  that  no  such  machine  can 
be  specified.  In  particular,  TYiring  showed  that  the  best  that  such  a  proposed  ma¬ 
chine  could  do  would  be  to  emulate  the  given  machine  to  see  whether  or  not  it 
halted.  If  the  emulated  machine  halted,  fine.  However,  the  emulated  machine  might 
run  forever  without  halting,  and  therefore  the  emulating  machine  could  not  answer 
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whether  or  not  it  would  halt.  Rice  and  others  have  extended  this  undecidability 
result  to  the  determination — by  inspection  alone — of  any  nontrivial  property  of  the 
future  behavior  of  an  arbitrary  machine.^'’ 

The  second  limitation  is  one  of  computability  in  practice.  There  are  many  be¬ 
haviors  for  which  we  do  not  know  how  to  specify  a  sequence  of  steps  that  will  cause 
a  computer  to  exhibit  that  behavior.  We  can  automate  what  we  can  expleun  how  to 
do,  but  there  is  much  that  we  cannot  explain  how  to  do.  Thus,  although  a  formal 
specification  for  a  machine  that  will  exhibit  a  certain  behavior  may  be  possible  in 
principle,  we  have  no  formal  procedure  for  producing  that  formal  specification  in 
practice,  short  of  a  trial  and  error  search  through  the  space  of  possible  descriptions. 

We  need  to  separate  the  notion  of  a  formal  specification  of  a  machine — that  is. 
a  specification  of  the  logical  structure  of  the  machine — from  the  notion  of  a  formal 
specification  of  a  machine’s  behavior — that  is,  a  specification  of  the  sequence  of 
transitions  that  the  machine  will  undergo.  In  general,  we  cannot  derive  behaviors 
from  structure,  nor  can  we  derive  structure  from  behaviors. 

The  moral  is:  in  order  to  determine  the  behavior  of  some  machines,  there  is 
no  recourse  but  to  run  them  auid  see  how  they  behave!  This  has  consequences 
for  the  methods  by  which  we  (or  nature)  go  about  generating  behavior  generators 
themselves,  which  we  will  take  up  in  the  section  on  evolution. 


2.5  JOHN  VON  NEUMANN:  FROM  MECHANICS  TO  LOGIC 

With  the  development  of  the  general  purpose  computer,  various  researchers  turned 
their  attention  from  the  mechanics  of  life  to  the  logic  of  life. 

The  first  computational  approach  to  the  generation  of  lifelike  behavior  was  due 
to  the  brilliant  Hungarian  mathematician  John  von  Neumann.  In  the  words  of  his 
colleague  Arthur  W.  Burks,  von  Neumann  was  interested  in  the  general  questionl^l; 

What  kind  of  logical  organization  is  sufficient  for  an  automaton  to  repro¬ 
duce  itself?  This  question  is  not  precise  and  admits  to  trivial  versions  as  well 
as  interesting  ones.  Von  Neumann  had  the  familiar  natural  phenomenon 
of  self-reproduction  in  mind  when  he  posed  it,  but  he  was  not  trying  to 
simulate  the  self-reproduction  of  a  natural  system  at  the  level  of  genetics 
and  biochemistry.  He  wished  to  abstract  from  the  natural  self-reproduction 
problem  its  logical  form. 

This  appro8u:h  is  the  first  to  capture  the  essence  of  Artificial  Life.  To  understand 
the  field  of  Artificial  Life,  one  need  only  replace  references  to  “self-reproduction” 
in  the  above  with  references  to  any  other  biological  phenomenon. 

In  von  Neumann’s  initial  thought  experiment  (his  kinematic  model”),  a  ma¬ 
chine  floats  around  on  the  surface  of  a  pond,  together  with  lots  of  machine  parts. 
The  machine  is  a  universal  constructor:  given  the  description  of  any  machine,  it 
will  locate  the  proper  parts  and  construct  that  machine.  If  given  a  description  of 

l^lFVom  Burks, ^  emphasis  added. 
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itself,  it  will  construct  itself.  This  is  not  quite  self-reproduction,  however,  because 
the  offspring  machine  will  not  have  a  description  of  itself  and  hence  could  not  go  on 
to  construct  another  copy.  So,  von  Neumann’s  machine  also  contains  a  description 
copier:  once  the  offspring  machine  has  been  constructed,  the  “parent”  machine  con¬ 
structs  a  copy  of  the  description  that  it  worked  from  and  attaches  it  to  the  offspring 
machine.  This  constitutes  genuine  self-reproduction. 

Von  Neumann  decided  that  this  model  did  not  properly  distinguish  the  logical 
form  of  the  process  from  the  material  of  the  process,  and  looked  about  for  a  com¬ 
pletely  formal  system  within  which  to  model  self-reproduction.  Stan  Ulam — one  of 
von  Neumann’s  colleagues  at  Los  Alamost^l — suggested  an  appropriate  formalism, 
which  has  come  to  be  known  as  a  cellular  automaton  (CA). 

In  brief,  a  CA  consists  of  a  regular  lattice  of  finite  automata,  which  are  the 
simplest  formal  models  of  machines.  A  finite  automaton  can  be  in  only  one  of  a 
finite  number  of  states  at  any  given  time,  and  its  transitions  between  states  from 
one  time  step  to  the  next  are  governed  by  a  state-transition  table:  given  a  certain 
input  and  a  certain  internal  state,  the  state-transition  table  specifies  the  state  to 
be  adopted  by  the  finite  automaton  at  the  next  time  step.  In  a  CA,  the  necessary 
input  is  derived  from  the  states  of  the  automata  at  neighboring  lattice  points.  Thus, 
the  state  of  an  automaton  at  time  f  +  1  is  a  function  of  the  states  of  the  automaton 
itself  and  its  immediate-neighbors  at  time  t.  All  of  the  automata  in  the  lattice  obey 
the  same  transition  table  and  every  automaton  changes  state  at  the  same  instant, 
time  step  after  time  step.  CA’s  are  good  examples  of  the  kind  of  computational 
paradigm  sought  after  by  Artificial  Life:  bottom-up,  parallel,  local  determination 
of  behavior. 

Von  Neumann  was  able  to  embed  the  equivalent  of  his  kinematic  model  as  an 
initial  pattern  of  state  assignments  within  a  large  CA  lattice  using  29  states  per  cell. 
Although  von  Neumann’s  work  on  self-reproducing  automata  was  left  incomplete 
at  the  time  of  his  death,  Arthur  Burks  organized  what  had  been  done,  filled  in  the 
remaining  details,  and  published  it.f^l  Figure  3  shows  a  schematic  diagram  of  von 
Neumann’s  self-reproducing  machine. 

Von  Neumann’s  CA  model  was  a  constructive  proof  that  an  essential  charac¬ 
teristic  behavior  of  living  things — self-reproduction — was  achievable  by  machines. 
Furthermore,  he  determined  that  "any  such  method  must  make  use  of  the  infor¬ 
mation  contained  in  the  description  of  the  mau:hine  in  two  fundamentally  different 
ways: 

1.  Interpreted,  as  instructions  to  be  executed  in  the  construction  of  the  offspring. 

2.  Uninterpreted,  as  passive  data  to  be  duplicated  to  form  the  description  given 

to  the  offspring. 

Ulam  also  investigated  dynamic  models  of  pattern  production  and  competition.^^ 
l^lTogether  with  a  transcription  of  von  Neumann’s  1949  lectures  at  the  University  of  Illinois 
entitled  “Theory  and  Organization  of  Complicated  Automata,”  in  which  he  gives  his  views  on 
various  problems  related  to  the  study  of  complex  systems  in  general.^* 
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FIGURE  3  Schematic  diagram  of  von  Neumann’s  CA  self-reproducing  configuration. 
From  Essays  on  Cellular  Automata  edited  by  A.  W.  Burk  (University  of  Illinois  Press, 
Urbana,  1970);  reprinted  by  permission  of  the  publisher. 


Of  course,  when  Watson  and  Crick  unveiled  the  structure  of  DNA,  they  discovered 
that  the  information  contained  therein  was  used  in  precisely  these  two  ways  in  the 
processes  of  transcription/translation  and  replication. 

In  describing  his  model,  von  Neumann  pointed  out  thatl^l; 

By  axiomatizing  automata  in  this  manner,  one  has  thrown  half  of  the  prob¬ 
lem  out  the  window,  and  it  may  be  the  more  important  half.  One  has  re¬ 
signed  oneself  not  to  explain  how  these  parts  are  made  up  of  real  things, 
specifically,  how  these  parts  are  made  up  of  actual  elementary  particles,  or 
even  of  higher  chemical  molecules. 


1^1  From  Burks.^ 
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Whether  or  not  the  more  important  half  of  the  question  has  been  disposed  of 
depends  on  the  questions  we  are  asking.  If  we  are  concerned  with  explaining  how 
the  life  that  we  know  emerges  from  the  known  laws  of  physics  and  organic-chemistry, 
then  indeed  the  interesting  part  has  been  tossed  out.  But,  if  we  are  concerned  with 
the  more  general  problem  of  explaining  how  lifelike  behaviors  emerge  out  of  low- 
level  interactions  within  a  population  of  logical  primitives,  we  have  retained  the 
more  interesting  portion  of  the  question. 


3.  THE  ROLE  OF  COMPUTERS  IN  STUDYING  LIFE  AND 
OTHER  COMPLEX  SYSTEMS 

Artificial  Intelligence  and  Artificial  Life  are  each  concerned  with  the  application  of 
computers  to  the  study  of  complex,  natural  phenomena.  Both  are  concerned  with 
generating  complex  behavior.  However,  the  manner  in  which  each  field  employs 
the  technology  of  computation  rhe  pursuit  of  its  respective  goals  is  strikingly 
different. 

AI  has  based  its  underlying  methodology  for  generating  intelligent  behavior 
on  the  computational  paradigm.  That  is,  AI  uses  the  technology  of  computation 
as  a  model  of  intelligence.  AL,  on  the  other  hand,  is  attempting  to  develop  a  new 
computational  paradigm  based  on  the  natural  processes  that  support  living  organ¬ 
isms.  That  is,  AL  uses  insights  dogy  to  explore  the  dynamics  of  interacting 

information  structures.  AL  has  auopted  the  computational  paradigm  as  its  un¬ 
derlying  methodology  of  behavior  generation,  nor  does  it  attempt  to  “explain"  life 
as  a  kind  of  computer  program. 

One  way  to  pursue  the  study  of  artificial  life  would  be  to  attempt  to  create  life 
in  vitro,  using  the  same  kinds  of  organic  chemicals  out  of  which  we  are  constituted. 
Indeed,  there  are  numerous  exciting  efforts  in  this  direction.  This  would  certainly 
teach  us  a  lot  about  the  possibilities  for  alternative  life-forms  within  the  Ccirbon- 
chain  chemistry  domain  that  could  have  (but  didn’t)  evolve  here. 

However,  biomolecules  are  extremely  small  and  difficult  to  work  with,  requiring 
rooms  full  of  special  equipment,  replete  with  dozens  of  “postdocs”  and  graduate 
students  willing  to  devote  the  larger  part  of  their  professional  Ccireers  to  the  per¬ 
fection  of  electrophoretic  gel  techniques.  Besides,  although  the  creation  of  life  in 
vitro  would  certainly  be  a  scientific  feat  worthy  of  note — and  probably  even  a  Nobel 
prize — it  would  not,  in  the  long  run,  tell  us  much  more  about  the  space  of  possible 
life  than  we  already  know. 

Computers  provide  an  alternative  medium  within  which  to  attempt  to  synthe¬ 
size  life.  Modern  computer  technology  has  resulted  in  machinery  with  tremendous 
potential  for  the  creation  of  life  in  silico. 

Computers  should  be  thought  of  as  an  important  laboratory  tool  for  the  study 
of  life,  substituting  for  the  array  of  incubators,  culture  dishes,  microscopes,  elec¬ 
trophoretic  gels,  pipettes,  centrifuges,  and  other  assorted  wet-lab  paraphernalia. 
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one  simple-to-master  piece  of  experimental  equipment  devoted  exclusively  to  the 
incubation  of  information  structures. 

The  advantage  of  working  with  information  structures  is  that  information  has 
no  intrinsic  size.  The  computer  is  the  tool  for  the  manipulation  of  information, 
whether  that  manipulation  is  a  consequence  of  our  actions  or  a  consequence  of  the 
actions  of  the  information  structures  themselves.  Computers  themselves  will  not 
be  alive,  rather  they  will  support  informational  universes  within  which  dynamic 
populations  of  informational  ’'molecules”  engage  in  informational  "biochemistry.” 

This  view  of  computers  ;us  workstations  for  performing  scientific  e.xperiments 
within  artificial  universes  is  fairly  new.  but  it  is  rapidly  becoming  accepted  as  a 
legitimate,  even  necessary,  way  of  pursuing  science.  In  the  days  before  computers, 
scientists  worked  primarily  with  systems  whose  defining  equations  could  be  solved 
analytically,  and  ignored  those  whose  defining  equations  could  not  be  so  solved. 
This  was  largely  the  case  because,  in  the  absence  of  analytic  solutions,  the  equations 
would  have  to  be  integrated  over  and  over  again,  essentially  simulating  the  time 
behavior  of  the  system.  Without  computers  to  handle  the  mundane  details  of  these 
calculations,  such  an  undertaking  was  unthinkable  except  in  the  simplest  cases. 

However,  with  the  advent  of  computers,  the  necessary  mundane  calculations  can 
be  relegated  to  these  idiot-savants,  and  the  realm  of  numerical  simulation  is  opened 
up  for  exploration.  "Exploration”  is  an  appropriate  term  for  the  process,  because  the 
numerical  simulation  of  systems  allows  one  to  “explore”  the  system's  behavior  under 
a  wide  range  of  parameter  settings  and  initial  conditions.  The  heuri  value  of 
this  kind  of  experimentation  cannot  be  over-estimated.  One  often  gains  tremendous 
insight  for  the  essential  dynamics  of  a  system  by  observing  its  behavior  under  a  wide 
range  of  initial  conditions.  Most  importantly,  however,  computers  are  beginning 
to  provide  scientists  with  a  new  paradigm  for  modeling  the  world.  When  dealing 
with  essentially  unsolvable  governing  equations,  the  primary  reason  for  producing  a 
formal  mathematical  model — the  hope  of  reaching  an  analytic  solution  by  symbolic 
manipulation — is  lost.  Systems  of  ordinary  and  partial  differential  equations  are 
not  very  well  suited  for  implementation  as  computer  algorithms.  One  might  expect 
that  other  modeling  technologies  would  be  more  appropriate  when  the  goal  is  the 
synthesis,  rather  than  the  analysis,  of  behavior. t®l 

This  expectation  is  easily  borneout.  With  the  precipitous  drop  in  the  cost  of 
raw  computing  power,  computers  are  now  available  that  are  capable  of  simulating 
physical  systems  from  first  principles.  This  means  that  it  has  become  possible, for 
example,  to  model  turbulent  flow  in  a  fluid  by  simulating  the  motions  of  its  con¬ 
stituent  particles — not  just  approximating  changes  in  concentrations  of  particles  at 
particular  points,  but  actually  computing  their  motions  exactly. 

What  does  all  of  this  have  to  do  with  the  study  of  life?  The  most  surprising  les¬ 
son  we  have  learned  from  simulating  complex  physical  systems  on  computers  is  that 
complex  behavior  need  not  have  complex  roots.  Indeed,  tremendously  interesting  and 


t®lSee  TofFoli^^  for  a  good  exposition. 
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FIGURE  4  The  bottom-up  versus  the  top-down  approach  to  modeling  complex 
systems.  Ordinal  figure  appeared  in  “Artificial  Life"  by  Christopher  Langton,  in  Artificial 
Life  edited  by  C.  Langton  (Addison-Wesley,  Redwood  City,  1 989). 


beguilingly  complex  behavior  can  emerge  from  collections  of  extremely  simple  com¬ 
ponents.  This  leads  directly  to  the  exciting  possibility  that  much  of  the  complex 
behavior  exhibited  by  nature — especially  the  complex  behavior  that  we  call  life — 
also  has  simple  generators.  Since  it  is  very  hard  to  work  backwards  from  a  complex 
behavior  to  its  generator,  but  very  simple  to  create  generators  and  synthesize  com¬ 
plex  behavior,  a  promising  approach  to  the  study  of  complex  natural  systems  is 
to  undertake  the  general  study  of  the  kinds  of  behavior  that  can  emerge  from  dis¬ 
tributed  systems  consisting  of  simple  components  (Figure  4). 


4.  NONLINEARITY  AND  LOCAL  DETERMINATION  OF 
BEHAVIOR. 

4.1  LINEAR  VS.  NONLINEAR  SYSTEMS 

As  mentioned  briefly  above,  the  distinction  between  linear  and  nonlinear  systems  is 
fundamental,  and  provides  excellent  insight  into  why  the  principles  underlying  the 
dynamics  of  life  should  be  so  hard  to  find.  The  simplest  way  to  state  the  distinction 
is  to  say  that  linear  systems  are  those  for  which  the  behavior  of  the  whole  is  just 
the  sum  of  the  behavior  of  its  parts,  while  for  nonlinear  systems,  the  behavior  of 
the  whole  is  more  than  the  sum  of  its  parts. 
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Linear  systems  are  those  which  obey  the  principle  of  superposition.  VVe  can 
break  up  complicated  linear  systems  into  simpler  constituent  parts,  and  analyze 
these  parts  independently.  Once  we  have  reached  an  understanding  of  the  parts  in 
isolation,  we  can  achieve  a  full  understanding  of  the  whole  system  by  composing 
our  understandings  of  the  isolated  parts.  This  is  the  key  feature  of  linear  systems: 
by  studying  the  parts  in  isolation,  we  can  learn  everything  we  need  to  know  about 
the  complete  system. 

This  is  not  possible  for  nonlinear  systems,  which  do  not  obey  the  principle 
of  superposition.  Even  if  we  could  break  such  systems  up  into  simpler  constituent 
parts,  and  even  if  we  could  reach  a  complete  understanding  of  the  parts  in  isolation, 
we  would  not  be  able  to  compose  our  understandings  of  the  individual  parts  into 
an  understanding  of  the  whole  system.  The  key  feature  of  no.r'’-near  systems  is  that 
their  primary  behaviors  of  interest  are  properties  of  the  interactions  between  parts, 
rather  than  being  properties  of  the  parts  themselves,  and  these  interaction-based 
properties  necessarily  disappear  when  the  parts  are  studied  independently. 

Thus,  analysis  is  most  fruitfully  applied  to  linear  systems.  Analysis  has  not 
proved  anywhere  near  as  effective  when  applied  to  nonlinear  systems:  the  nonlinear 
system  must  be  treated  as  a  whole. 

A  different  approach  to  the  study  of  nonlinear  systems  involves  the  inverse  of 
analysis:  synthesis.  Rather-than  start  with  the  behavior  of  interest  and  attempting 
to  analyze  it  into  its  constituent  parts,  we  start  with  constituent  parts  and  put 
them  together  in  the  attempt  to  synthesize  the  behavior  of  interest. 

Life  is  a  property  of  form,  not  matter,  a  result  of  the  organization  of  matter 
rather  than  something  that  inheres  in  the  matter  itself.  Neither  nucleotides  nor 
amino  acids  nor  any  other  carbon-chain  molecule  is  alive — yet  put  them  together 
in  the  right  way,  and  the  dynamic  behavior  that  emerges  out  of  their  interactions 
is  what  we  call  life.  It  is  effects,  not  things,  upon  which  life  is  based — life  is  a  kind 
of  behavior,  not  a  kind  of  stuff — and  as  such,  it  is  constituted  of  simpler  behaviors, 
not  simpler  stuff.  Behaviors  themselves  can  constitute  the  fundamental  parts  of 
nonlinear  systems — virtual  parts,  which  depend  on  nonlinear  interactions  between 
physical  parts  for  their  very  existence.  Isolate  the  physical  parts  and  the  virtual 
parts  cease  to  exist.  It  is  the  virtual  parts  of  living  systems  that  Artificial  Life  is 
after,  and  synthesis  is  its  primary  methodological  tool. 


4.1  THE  PARSIMONY  OF  LOCAL  DETERMINATION  OF  BEHAVIOR 

It  is  easier  to  generate  complex  behavior  from  the  application  of  simple,  local  rules 
than  it  is  to  generate  complex  behavior  from  the  application  of  complex,  global  rules. 
This  is  because  complex  global  behavior  is  usually  due  to  nonlinear  interactions 
occurring  at  the  local  level.  With  bottom-up  specifications,  the  system  computes  the 
local,  nonlinear  interactions  explicitly  and  the  global  behavior,  which  was  implicit 
in  the  local  rules,  emerges  spontaneously  without  being  treated  explicitly. 

With  top-down  specifications,  however,  local  behavior  must  be  implicit  in  global 
rules!  This  is  really  putting  the  cart  before  the  horse!  The  global  rules  must 
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“predict.”  the  effects  on  global  structure  of  many  local,  nonlinear  interactions — 
something  which  we  have  seen  is  intractable,  even  impossible,  in  the  general  case. 
Thus,  top-down  systems  must  take  computational  shortcuts  and  explicitly  deal  with 
special  cases,  which  results  in  inflexible,  brittle,  and  unnatural  behavior. 

Furthermore,  in  a  system  of  any  complexity,  the  number  of  possible  global 
states  is  astronomically  enormous,  and  grows  exponentially  with  the  size  of  the 
system.  Systems  that  attempt  to  supply  global  rules  for  global  behavior  simply 
cannot  provide  a  different  rule  for  every  global  state.  Thus,  the  global  states  must  be 
classified  in  some  manner,  and  categorized  using  a  coarse-grained  scheme  according 
to  which  the  global  states  within  a  category  are  indistinguishable.  The  rules  of  the 
system  can  only  be  applied  at  the  level  of  resolution  of  these  categories.  There 
are  many  possible  ways  to  implement  a  classification  scheme,  most  of  which  will 
yield  different  partitionings  of  the  global  state  space.  Any  rule-based  system  must 
necessarily  assume  that  finer-grained  differences  don’t  matter,  or  must  include  a 
finite  set  of  tests  for  “special  cases,”  and  then  must  assume  that  no  other  special 
cases  are  relevant. 

For  most  complex  systems,  however,  fine  differences  in  the  global  state  can 
result  in  enormous  differences  in  global  behavior,  and  there  may  be  no  way  in 
principle  to  partition  the  space  of  global  states  in  such  a  way  that  specific  fine 
differences  have  the  appropriate  global  impact. 

On  the  other  hand,  systems  that  supply  local  rules  for  local  behaviors,  can 
provide  a  different  rule  for  each  and  every  possi’nl  '  local  state.  Furthermore,  the 
size  of  the  local  state  space  can  be  completely  independent  of  the  size  of  the  system. 
In  local  rule-governed  systems,  each  local  state,  and  consequently  the  global  state, 
can  be  determined  exactly  and  precisely.  Fine  differences  in  the  global  state  will 
result  in  very  specific  differences  in  the  local  state  and,  consequently,  will  affect  the 
invocation  of  local  rules.  As  fine  differences  affect  local  behavior,  the  difference  will 
be  felt  in  an  expanding  patch  of  local  states,  and  in  this  manner — propagating  from 
local  neighborhood  to  local  neighborhood — fine  differences  in  global  state  can  result 
in  large  differences  in  global  behavior.  The  only  “special  cases”  explicitly  dealt  with 
in  locally  determined  systems  are  ex3w:tly  the  set  of  all  possible  local  states,  and 
the  rules  for  these  are  just  exactly  the  set  of  all  local  rules  governing  the  system 


5.  BIOLOGICAL  AUTOMATA 

Organisms  have  been  compared  to  extremely  complicated  and  finely  tuned  bio¬ 
chemical  machines.  Since  we  know  that  it  is  possible  to  abstract  the  logical  form  of 
a  machine  from  its  physical  hardware,  it  is  natural  to  ask  whether  it  is  possible  to 
abstract  the  logical  form  of  an  organism  from  its  biochemical  weiware.  The  neld  of 
Artificial  Life  is  devoted  to  the  investigation' of  this  question. 

In  the  following  sections  we  will  look  at  the  manner  in  which  behavior  is  gener¬ 
ated  in  a  bottom-up  fashion  in  living  systems.  We  then  generalize  the  mechanisms 
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by  which  this  behavior  generation  is  accomplished,  so  that  we  may  apply  them  to 
the  task  of  generating  behavior  in  artificial  systems. 

We  will  find  that  the  essential  machinery  of  living  organisms  is  quite  a  bit 
different  from  the  machinery  of  our  own  invention,  and  we  would  be  quite  mis¬ 
taken  to  attempt  to  force  our  preconceived  notions  of  abstract  machines  onto  the 
machinery  of  life.  The  difference,  once  again,  lies  in  the  exceedingly  parallel  and 
distributed  nature  of  the  operation  of  the  machinery  of  life,  as  contrasted  with  the 
singularly  serial  and  centralized  control  structures  associated  with  the  machines  of 
our  invention. 

5.1  GENOTYPES  AND  PHENOTYPES 

The  most  salient  characteristic  of  living  systems,  from  the  behavior  generation 
point  of  view,  is  the  genotype/phenotype  distinction.  The  distinction  is  essentially 
one  between  a  specification  of  machinery — the  genotype— and  the  behavior  of  that 
mai"hinery — the  phenotype. 

The  genotype  is  the  complete  set  of  genetic  instructions  encoded  in  the  linear 
sequence  of  nucleotide  bases  that  makes  up  an  organism's  DNA.  The  phenotype 
is  the  physical  organism  itself — the  structures  that  emerge  in  space  and  time  as 
the  result  of  the  interpretation  of  the  genotype  in  the  context  of  a  particular  en¬ 
vironment.  The  process  by  which  the  phenotype  develops  through  time  under  the 
direction  of  the  genotype  is  called  morphogenesis.  The  individual  genetic  instruc¬ 
tions  are  called  genes  and  consist  of  short  stretches  of  DNA.  These  instructions 
are  “executed” — or  expressed — when  their  DNA  sequence  is  used  as  a  template 
for  transcription.  In  the  case  of  protein  synthesis,  transcription  results  in  a  dupli¬ 
cate  nucleotide  strand  known  as  a  messenger  RNA — or  mRN A— constructed  by  the 
process  of  base-pairing.  This  mRNA  strand  may  then  be  modified  in  various  ways 
before  it  makes  its  way  out  to  the  cytoplasm  where,  at  bodies  known  as  ribosomes, 
it  serves  as  a  template  for  the  construction  of  a  linear  chain  of  ir  7  acids.  The 
resulting  polypeptide  chain  will  fold  up  on  itself  in  a  complex  wain.  forming  a 
tightly  packed  molecule  known  as  a  protein.  The  finished  protein  detaches  from  the 
ribosome  and  may  go  on  to  serve  ^  a  passive  structural  element  in  the  cell,  or 
may  have  a  more  active  role  as  an  enzyme.  Enzymes  are  the  functional  molecular 
“operators”  in  the  logic  of  life. 

One  may  consider  the  genotype  as  a  largely  unordered  “bag”  of  instructions, 
each  one  of  which  is  essentially  the  specification  for  a  “machine”  of  some  sort — 
passive  or  active.  When  instantiated,  each  such  “machine”  will  enter  into  the  ongo¬ 
ing  logical  “fray”  in  the  cytoplasm,  consisting  largely  of  local  int' '  ■'■tions  between 
other  such  machines.  Each  such  instruction  will  be  “executed”  v\.,  its  own  trig¬ 
gering  conditions  are  met  and  will  have  specific,  local  effects  on  structures  in  the 
cell.  Furthermore,  each  such  instruction  will  operate  within  the  context  of  all  of  the 
other  instructions  that  have  been — or  are  being — executed. 
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The  phenotype,  then,  consists  of  the  structures  and  dynamics  that  emerge 
through  time  in  the  course  of  the  execution  of  the  parallel,  distributed  “compu¬ 
tation”  controlled  by  this  genetic  “bag”  of  instructions.  Since  gene’s  interactions 
with  one  another  are  highly  nonlinear,  the  phenotype  is  a  nonlinear  function  of  the 
genotype. 


5.2  GENERALIZED  GENOTYPES  AND  PHENOTYPES 

In  the  context  of  Artificial  Life,  we  need  to  generalize  the  notions  of  genotype  and 
phenotype,  so  that  we  may  applv  ♦hem  in  non-biological  situations.  We  will  use 
the  term  generalized  genotype  ’ — to  refer  to  any  largely  unordered  set 

of  low-level  rules,  eind  we  will  rm  generalized  phenotype — or  PTYPE — 

to  refer  to  the  behaviors  and/or  siii^v^ures  that  emerge  out  of  the  interactions 
among  these  low-level  rules  when  they  are  activated  within  the  context  of  a  specific 
environment.  The  GTYPE,  essentially,  is  the  specification  for  a  set  of  machines, 
while  the  PTYPE  is  the  behavior  that  results  as  the  machines  are  run  and  interact 
with  one  another. 

This  is  the  bottom-up  approach  to  the  generation  of  behavior.  A  set  of  entities 
is  defined,  and  each  entity  is  endowed  with  a  specification  for  a  simple  behavioral 
repertoire — a  GTYPE—fhat  contains  instructions  which  detail  its  reactions  to  a 
wide  range  of  local  encounters  with  other  such  entities  or  with  specific  features  of 
the  environment.  Nowhere  is  the  behavior  of  the  set  of  entities  as  a  whole  specified. 
The  global  behavior  of  the  aggregate — the  PTYPE — emerges  out  of  the  collective 
interactions  among  individual  entities. 

It  should  be  noted  that  the  PTYPE  is  a  multilevel  phenomenon.  First,  there 
is  the  PTYPE  associated  with  each  particular  instruction — the  effect  which  that 
instruction  has  on  an  entity’s  behavior  when  it  is  expressed.  Second,  there  is  the 
PTYPE  associated  with  each  individual  entity — its  individual  behavior  within  the 
aggregate.  Third,  there  is  the  PTYPE  associated  with  the  behavior  of  the  aggregate 
as  a  whole. 

This  is  true  for  natural  systems  as  well.  We  can  talk  about  the  phenotypic  trait 
associated  with  a  particular  gene,  can  identify  the  phenotype  of  an  individual 
cell,  and  we  can  identify  the  phenotype  of  an  entire  multi-cellular  organism — its 
body,  in  effect.  PTYPES  should  be  complex  and  multilevel.  If  we  want  to  simulate 
life,  we  should  expect  to  see  hierarchical  structures  emerge  in  our  simulations.  In 
general,  phenotypic  traits  at  the  level  of  the  whole  organism  will  be  the  result  of 
many  nonlinear  interactions  between  genes,  and  there  will  be  no  single  gene  to 
which  one  can  assign  responsibility  for  the  vast  majority  of  phenotypic  traits. 

In  summary,  GTYPES  are  low-level  rules  for  behavors — i.e.,  abstract  speci¬ 
fications  for  “machines” — which  will  engage  in  local  interactions  within  a  large 
aggregate  of  other  such  behaviors.  PTYPES  are  the  behavors — the  structures  in 
time  and  space — that  develop  out  of  these  nonlinear,  local  interactions  (Figure  5). 
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Global  behaviors  and  structures 
emerge  at  this  level 


DEVELOPMENT 


Local  rules  govern  simple 
nonlinear  interactions 
at  this  level 


FIGURE  5  The  relationship  between  GTYPE  and  PTYPE.  Original  figure  appeared  in 
“Artificial  Life"  by  Christopher  Langton,  in  Artificial  Life  edited  by  C.  Langton  (Addison- 
Wesley,  Redwood  City,  1989). 


5.3  UNPREDICTABILITY  OF  PTYPE  FROM  GTYPE 

Nonlinear  interactions  between  the  objects  specified  by  the  GTYPE  provide  the 
basis  for  an  extremely  rich  variety  oLpossible  PTYPES.  PTYPES  draw  on  the  full 
combinatorial  potential  implicit  in  the  set  of  possible  interactions  between  low-level 
rules.  The  other  side  of  the  coin,  however,  is  that  we  cannot  predict  the  PTYPES 
that  will  emerge  from  specific  GTYPES,  due  to  the  general  unpredictability  of 
nonlinear  systems.  If  we  wish  to  maintain  the  property  of  predictability,  then  we 
must  restrict  severely  the  nonlinear  dependence  of  PTYPE  on  GTYPE,  but  this 
forces  us  to  give  up  the  combinatorial  richness  of  possible  PTYPES.  Therefore,  a 
trade-off  exists  between  behavioral  richness  and  predictability  (or  ‘‘programmabil¬ 
ity”).  We  shall  see  in  the  section  on  evolution  that  the  lack  of  programmability  is 
adequately  compensated  for  by  the  increased  capacity  for  adaptiveness  provided  by 
a  rich  behavioral  repertoire. 
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As  discussed  previously,  we  know  that  it  is  impossible  in  the  general  case  to 
determine  any  nontrivial  property  of  the  future  behavior  of  a  sufficiently  power¬ 
ful  computer  from  a  mere  inspection  of  its  program  and  its  initial  state  alone.'''  A 
Turing  machine — the  formal  equivalent  of  a  general  purpose  computer — can  be  cap¬ 
tured  within  the  scheme  of  GTYPE/PTYPE  systems  by  identifying  the  machine's 
transition  table  as  the  GTYPE  and  the  resulting  computation  as  the  PTYPE.  From 
this  we  can  deduce  that  in  the  general  case  it  will  not  be  possible  to  determine, 
by  inspection  alone,  any  nontrivial  feature  of  the  PTYPE  that  will  emerge  from  a 
given  GTYPE  in  the  context  of  a  particular  initial  configuration.  In  general,  the 
only  way  to  find  out  anything  about  the  PTYPE  is  to  start  the  system  up  and 
watch  what  happens  as  the  PTYPE  develops  under  control  of  the  GTYPE  and  the 
environment. 

Similarly,  it  is  not  possible  in  the  generad  case  to  determine  what  specific  al¬ 
terations  must  be  made  to  a  GTYPE  to  effect  a  desired  change  in  the  PTYPE. 
The  problem  is  that  any  specific  PTYPE  trait  is,  in  general,  an  effect  of  many, 
mainy  nonlinear  interactions  between  the  behavioral  primitives  of  the  system  (an 
“epistatic  trait”  in  biological  terms).  Consequently,  given  an  arbitrary  proposed 
change  to  the  PTYPE,  it  may  be  impossible  to  determine  by  any  formal  procedure 
exactly  what  changes  would  have  to  be  made  to  the  GTYPE  to  effect  that — and 
only  that — change  in  the  PTYPE.  It  is  not  a  practically  computable  problem.  There 
is  no  way  to  calculate  the  answer — short  of  exhaustive  search — even  though  there 
may  be  an  answerH’’^ 

The  only  way  to  proceed  in  the  face  of  such  an  unpredictability  result  is  by 
a  process  of  trial  and  error.  However,  some  processes  of  trial  and  error  are  more 
efficient  than  others.  In  natural  systems,  trial  and  error  are  interlinked  in  such  a 
way  that  error  guides  the  choice  of  trials  under  the  process  of  evolution  by  natural 
selection.  It  is  quite  likely  that  this  is  the  on/y  efficient,  general  procedure  that  could 
find  GTYPES  with  specific  PTYPE  traits  when  nonlinear  functions  are  involved. 


6.  RECURSIVELY  GENERATED  OBJECTS 

In  the  previous  section,  we  described  the  distinction  between  genotype  and  pheno¬ 
type,  and  we  introduced  their  generalizations  in  the  form  of  GTYPES  and  PTYPES. 
In  this  section,  we  will  review  a  general  approach  to  building  GTYPE/PTYPE  sys¬ 
tems  based  on  the  methodology  of  recursively  generated  objects. 

A  major  appeal  of  this  approexh  is  that  it  arises  naturally  from  the  GTYPE/ 
PTYPE  distinction:  the  local  developmental  rules — the  recursive  description  itself — 
constitute  the  GTYPE,  and  the  developing  structure — the  recursively  generated 
object  or  behavior  itself — constitutes  the  PTYPE. 


PI  An  example  in  biology  would  be:  What  changes  would  have  to  be  made  to  the  genome  in  order 
to  produce  six  fingers  on  each  hand  rather  than  five? 
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l  nder  the  methodology  of  recursively  generated  objects,  the  "object  "  is  a  struc¬ 
ture  that  has  sub-parts.  I  he  rules  of  tlie  system  specify  how  to  modify  the  most 
elementary,  "atomic"  ."ub-parts,  .and  are  usually  sensiti'e  to  the  ronlcit  in  which 
thi'se  atomic  sub-parts  are  embeildial.  That  is.  the  state  id"  the  "  neighborhood  "  of 
an  atomic  sub-part  is  taken  into  account  in  determinimi  which  rule  to  apply  in 
order  to  modify  that  sub-part.  It  i.s  usually  the  c;ise  that  there  are  no  rules  in  the 
systi'tn  whose  conte'.Mt  is  the  entire  structure;  that  is,  there  is  no  use  made  of  qlohnl 
information.  Kach  piece  is  modified  .solely  on  the  basis  of  its  own  state  and  the  state 
of  the  pieces  "  iK'arby  "" 

Of  courst'.  it  the  initial  structure  consists  of  a  single  part  .'us  might  bi'  the  c,i,^<' 
with  the  initial  seed  then  tin'  c,,in,>xt  for  a[)[)lying  a  rule  is  n('cessaril\  global. 
The'  usual  sit  ual ion  i.s  t liat  a  .'structure  con.si.st.s  ("if  vidjiy  parts,  onlv  a  local  sub-st't 
ot  which  determitu'  tin'  rule  iliai  will  be  u.seii  to  modify  any  one  sub-part  of  the 
St  ruct  ur('. 

\  recursivi'ly  generated  nbjeit.  tln-n.  is  a  kind  (d  Pr\'PK.  and  the  recursive 
di'scription  th.at  generates  it  is  ,a  kind  of(n"^’l’P.  The  P  r\  PF,  willi'inerge  undi'r  the 
action  of  the  (i  dt'veloping  through  time  via  a  proct'ss  akin  to  morphogenesis. 

We  will  illustrjite  I  In*  notion  of  na'iirsively  generated  objects  with  e.xamples 
taken  from  the  literature  on  L-systt'ms.  cellular  automata,  and  computi'r  animation. 


6.1  EXAMPLE  1 :  LINDENMAYER  SYSTEMS 

Lindenmayt'r  systems  (I.-svstems)  consist  id"  sets  of  rules  for  rewriting  strings  of 
.symbols,  and  b<'ar  strong  relationships  to  the  formal  grammars  treated  by  ("hom- 
sky.  We  will  give  .several  examples  of  I.-systems  illustrating  the  methodology  of 
recursively  generated  (d>jeci.s,i'd 

In  the  following  "".\  —  V  means  that  one  reylaci's  every  occurrence  of  symbol 
"A  in  t  he  struct  lire  wit  h  St  ring  Since  t  he  symbol  "  may  appear  on  the  right 
as  well  as  the  left  sidi's  ol  some  rules,  the  set  of  rules  can  1h'  applied  "recursively""  to 
the  newly  rewritten  structures.  I  he  process  can  be  continued  ad  infinitum  although 
some  sets  of  rules  will  ri'siilt  in  a  ""final"'  configuration  when  no  more  changes  occur. 

SIMPLE  LINEAR  GROWTH  Here  is  an  example  of  the  simplest  kind  of  L  .\vsiem 
The  rules  are  conleit  free,  meaning  that  the  context  in  which  a  particular  [uirt  is 
situated  is  not  considi'red  when  altering  it.  There  must  be  only  one  rule  per  part  if 
the  system  is  to  be  deterministic. 

The  rules  (the  "  recursive  description"  or  GTYPE): 

1)  A  ->  CB 

2)  B  ->  4 

3)  C  ->  DA 

4)  D  ->  C 


I”! For  a  more  detailed  review,  see  the  the  book  The  AlgoTithmir  Btauly  of  Plants.'^ 
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When  applied  to  the  initial  seed  structure  “A,”  the  following  structural  history 
develops  (each  successive  line  is  a  successive  time  step); 


tiua  stractura  ralaa  applied  (L  to  R) 

0  L 

1  C  B 

2  D  i  A 

3  C  C  B  C  B 

4  ... (etc) . . 

And  so  forth. 

The  “PTYPE”  that  emerges  from  this  kind  of  recursive  application  of  a  simple, 
local  rewriting  rule  can  get  extremely  complex.  These  kinds  of  grammars  (whose 
rules  replace  single  symbols)  have  been  shown  to  be  equivalent  to  the  operation  of 
finite  state  machines.  With  appropriate  restrictions,  they  are  also  equivalent  to  the 
■‘regular  languages”  defined  by  Chomsky. 

BRANCHING  GROWTH  L-systems  incorporate  meta-symbols  to  represent  branching 
points,  allowing  a  new  line  of  symbols  to  branch  off  from  the  main  "stem." 

The  following  gramrriar  produces  branching  structures.  The  "(  )”  and  “[  ]" 
notations  indicate  left  and  right  branches,  respectively,  and  the  strings  within  them 
indicate  the  structure  of  the  branches  themselves. 

The  rules — or  GTYPE; 


(initial  "seed") 

(rule  1  replaces  A  aith  CB) 

(rule  3  replaces  C  aith  DA  A  rule  2  replaces  B  aith  A) 
(rule  4  replaces  D  aith  C  k  rule  1  replaces  the  tao 

A’s  aith  CB’s) 


1)  A  ->  C[B]D 

2)  B  ->  A 

3)  C  ->  C 

4)  D  ->  C(E)A 

5)  E  ->  D 

When  applied  to  the  starting  structure  “A,”  the  following  sequence  develops  (using 
linear  notation): 

time  structure  rules  applied  (L  to  R) 


0  A  initial  "seed" . 

1  C[B]D  rule  1. 

2  CCA]C(E)A  rules  3,2,4. 

3  C[C[B]D]C(D)C[B]D  rules  3, 1,3, 5,1. 

4  C[C[A]C(E)A]C(C(E)A)C[A]C(E)A  rules  3,3 ,2  ,4 ,3,4  ,3 ,2 ,4  . 


In  two  dimensions,  the  structure  develops  as  follows: 
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II  r  - 1  s 

a,  plant 

/)]  :  plant  -  intornodn  plant  f  Howor; 

leaf  i  internodc  |  •  v  leaf  i 
plant  flowt'r  i  f  plant  flower 
/'j  internodc  •  !  \  \  leaf  j  |/  ■\  leaf  !  F  seg 

/'  i  :  seg  seg  ['  seg 

/).)  :  leaf--  [’  {  t-t  tf  I  •  •  t'-lf-f  }  ] 

:  Hower  [  .V  \  .V  ptnlicel  wedge  /  wedge  /  / 
wedge  F'  wedge  //  wedge  ! 

/>(i  :  pedicel  —  FI- 

/)7  :  wedge  — ‘  f‘  f'  M  {  X-  .V  .S.'  -t>f  [  f  ^  f  |  ' 

FIGURE  6  An  L-system  plant  grown  from  rules  incorporating  graphical  rendering 
information.  Original  figure  appeared  in  The  Aigonihmic  Beauty  of  Plants.  (Berlin: 
Springer-Verlag,  1991)."° 
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\  I 

E  A 

\l 


I  /  I  / 

D  B  C  A 

1/  - >  1/ 

I  I  I 

A  - >  C  C 

I  I  I 

t  =  0  1  2 


I  / 

D  B 

1/  / 

\  I  / 

DC  D 

\l  /_B_  - >  (etc) 

I  / 

— >  C  C 
1/ 

I 

c 

I 

3  ...  and  so  on. . . 


Note  that  at  each  step,  every  symbol  ts  replaced,  even  if  just  by  another  copy  of 
itself.  This  figure  shows  the  result  of  growing  a  structure  using  the  rules  shown, 
which  contain  graphical  rendering  information  in  addition  to  the  usual  “structural” 
information. 


SIGNAL  PROPAGATION  In  order  to  propagate  signals  along  a  structure,  one  must 
have  something  more  than  just  a  single  symbol  on  the  left-hand  side  of  a  rule. 
When  there  is  more  thaff'bne  symbol  on  the  left-hand  side  of  a  rule,  the  rules  are 
context  sensitive — i.e.,  the  “context”  within  which  a  symbol  occurs  (the  symbols 
next  to  it)  are  important  in  determining  what  the  replacement  string  will  be.  The 
next  example  illustrates  why  this  is  critical  for  signal  propagation. 

In  the  following  example,  the  symbol  in  “{*}’s”  is  the  symbol  (or  string  of 
symbols)  to  be  replaced,  the  rest  of  the  left-hand  side  is  the  context,  and  the 
symbols  “[”  and  “]”  indicate  the  left  and  right  ends  of  the  string,  respectively. 

Suppose  the  rule  set  contains  the  following  rules: 


1)  Licy  ->  c 

2)  c{c}  ->  c 

3)  *{0}  ->  • 

4)  {•}€  ->  C 

5)  {•}]  ->  • 


a  "C"  at  the  left -end  of  the  string  ranains  a  "C." 
a  "C"  with  a  "C"  to  its  left  reuins  a  "C." 
a  "C"  with  an  '*♦"  to  its  loft  becoaes  an 
an  with  a  "C*.  to  its  right  becoaes  a  “C." 

an  at  the  right  end  of  the  string  reuins  an  "* . " 


Under  these  rules,  the  initial  structure  “*CCCCCCC”  will  result  in  the  being 
propagated  to  the  right,  as  follows: 


tlM 


stnctnxe 


0 

1 

2 

3 

4 

5 

6 
7 


•ccccccc 

CeCCCCCC 

oc«(Xxxx: 

CCCeCCCC 

CCCCeCCC 

CCCCCeOC 

occ(xx:*c 

cccoccc* 
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This  would  not  be  possible  without  taking  the  “context”  of  a  symbol  into  ac¬ 
count.  In  general,  these  kinds  of  grammars  are  equivalent  to  Chomsky’s  “context- 
sensitive”  or  “Turing”  languages,  depending  on  whether  or  not  there  are  any  re¬ 
strictions  on  the  kinds  of  strings  on  the  left-  eind  right-hand  sides. 

The  capacity  for  signal  propagation  is  extremely  important,  for  it  allows  ar¬ 
bitrary  computational  processes  to  be  embedded  within  the  structure,  which  may 
directly  affect  the  structure’s  development.  The  next  example  demonstrates  how 
embedded  computation  can  affect  development. 


6.2  EXAMPLE  2:  CELLULAR  AUTOMATA 

Cellular  automata  (CA)  provide  another  example  of  the  recursive  application  of  a 
simple  set  of  rules  to  a  structure.  In  CA,  the  structure  that  is  being  updated  is 
the  entire  universe:  a  lattice  of  finite  automata.  The  local  rule  set — the  GTYPE — 
in  this  case  is  the  transition  function  obeyed  homogeneously  by  every  automaton 
in  the  lattice.  The  local  context  taken  into  account  in  updating  the  state  of  each 
automaton  is  the  state  of  the  automata  in  its  immediate  neighborhood.  The  tran¬ 
sition  function  for  the  automata  constitutes  a  local  physics  for  a  simple,  discrete 
space/time  universe.  The  universe  is  updated  by  applying  the  local  physics  to  each 
local  “cell”  of  its  structure"over  and  over  again.  Thus,  although  the  physical  struc¬ 
ture  itself  doesn’t  develop  over  time,  its  state  does. 

Within  such  universes,  one  can  embed  all  manner  of  processes,  relying  on  the 
context  sensitivity  of  the  rules  to  local  neighborhood  conditions  to  propagate  infor¬ 
mation  around  within  the  universe  “meaningfully.”  In  particular,  one  can  embed 
general  purpose  computers.  Since  these  computers  are  simply  pairticular  configura¬ 
tions  of  states  within  the  lattice  of  automata,  they  can  compute  over  the  very  set 
of  symbols  out  of  which  they  are  constructed.  Thus,  structures  i..  this  universe  can 
compute  and  construct  other  structures,  which  also  may  compute  and  construct. 

For  example,  here  is  the  simplest  known  structure  that  can  reproduce  itself: 

22222222 
2170140142 
2022  2*2  2202 
2  7  2  2  1  2 

2  12  2  12 

2  0  2  2  1  2 

2  7  2  2  1  2 

21222222122222 
207107107111112 
2222222222222 

Each  number  is  the  state  of  one  automaton  in  the  lattice.  Blank  space  is  pre¬ 
sumed  to  be  in  state  “0.”  The  “2”-states  form  a  sheath  around  the  “l”-state  data 
path.  The  “7  0”  and  “4  0”  state  pairs  constitute  signals  embedded  within  the  data 
path.  They  will  propagate  counterclockwise  around  the  loop,  cloning  off  copies 
which  propagate  down  the  extended  tail  as  they  pass  the  T-junction  between  loop 
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and  tail.  When  the  signals  reach  the  end  of  the  tail,  they  have  the  following  effects: 
each  “7  0”  signal  extends  the  tail  by  one  unit,  and  the  two  “4  0”  signals  construct 
a  left-hand  corner  at  the  end  of  the  tail.  Thus,  for  each  full  cycle  of  the  instructions 
around  the  loop,  another  side  and  corner  of  an  “offspring- loop”  will  be  constructed. 
When  the  tail  finally  runs  into  itself  after  four  cycles,  the  collision  of  signals  results 
in  the  disconnection  of  the  two  loops  as  well  as  the  construction  of  a  tail  on  each 
of  the  loops. 

After  151  time  steps,  this  system  will  evolve  to  the  following  configuration: 


2 

2  12 
2  7  2 
2  0  2 
2  12 

222222272 
2111701702 
2122222212 
2  1  2  2  7  2 

2  1  2  2  0  2 

2  4  2  2  1  2 

2  1  2  2  7  2 

2022  Z-2  2202 
2410710712 
22222222 


22222222 
2170140142 
2022222202 
2  7  2  2  1  2 

2  12  2  12 

2  0  2  2  1  2 

2  7  2  2  1  2 

21222222122222 
207107107111112 
2222222222222 


Thus,  the  initial  configuration  has  succeeded  in  reproducing  itself. 

Each  of  these  loops  will  go  on  to  reproduce  itself  in  a  similar  manner,  giving 
rise  to  an  expanding  colony  of  loops,  growing  out  into  the  array. 

These  embedded  self-reproducing  loops  are  the  result  of  the  recursive  appli¬ 
cation  of  a  rule  to  a  seed  structure.  In  this  case,  the  primary  rule  that  is  being 
recursively  applied  constitutes  the  “physics”  of  the  universe.  The  initial  state  of 
the  loop  itself  constitutes  a  little  “computer”  under  the  recursively  applied  physics 
of  the  universe:  a  computer  whose  program  causes  it  to  construct  a  copy  of  itself. 
The  “program”  within  the  loop  computer  is  also  applied  recursively  to  the  growing 
structure.  Thus,  this  system  really  involves  a  double  level  of  recursively  applied 
rules.  The  mechanics  of  applying  one  recursive  rule  within  a  universe  whose  physics 
is  governed  by  another  recursive  rule  had  to  be  worked  out  by  trial  and  error.  This 
system  makes  use  of  the  signal  propagation  capacity  to  embed  a  structure  that 
itself  computes  the  resulting  structure,  rather  than  having  the  “physics”  directly 
responsible  for  developing  the  final  structure  from  a  passive  seed. 

This  captures  the  flavor  of  what  goes  on  in  natural  biological  development:  the 
genotype  codes  for  the  constituents  of  a  dynamic  process  in  the  cell,  auid  it  is  this 
dynamic  process  that  is  primarily  responsible  for  mediating — or  “computing” — the 
expression  of  the  genotype  in  the  course  of  development. 
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6.3  EXAMPLE  3:  FLOCKING  “BOIDS” 


The  previous  examples  were  largely  concerned  with  the  growth  and  development  of 
structural  PTYPES.  Here,  we  give  an  example  of  the  development  of  a  behavioral 


PTYPE. 


FIGURE  7  A  flock  of  "Boids”  negotiating  a  field  of  columns.  Sequence  generated  by 
Criag  Reynolds.  Original  figure  appeared  in  “Artificial  Life”  by  Christopher  Langton.  in 
Artificial  Life  edited  by  C.  Langton  (Addison-Wesley,  Redwood  City,  1989). 
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Craig  Reynolds  has  implemented  a  simulation  of  flocking  behavior. In  this 
model — which  is  meant  to  be  a  general  platform  for  studying  the  qualitatively  sim¬ 
ilar  phenomena  of  flocking,  herding,  and  schooling — one  has  a  large  collection  of 
autonomous  but  interacting  objects  (which  Reynolds  refers  to  as  “Boids”),  inhab¬ 
iting  a  common  simulated  environment. 

The  modeler  can  specify  the  manner  in  which  the  individual  Boids  will  respond 
to  local  events  or  conditions.  The  global  behavior  of  the  aggregate  of  Boids  is  strictly 
an  emergent  phenomenon,  none  of  the  rules  for  the  individual  Boids  depend  on 
global  information,  and  the  only  updating  of  the  global  state  is  done  on  the  basis 
of  individual  Boids  responding  to  local  conditions. 

Each  Boid  in  the  aggregate  shares  the  same  behavioraJ  “tendencies’'; 

■  to  maintain  a  minimum  distance  from  other  objects  in  the  environment,  in¬ 
cluding  other  Boids, 

■  to  ii,atch  velocities  with  Boids  in  its  neighborhood,  and 

■  to  move  toward  the  perceived  center  of  mass  of  the  Boids  in  its  neighborhood. 

These  are  the  only  rules  governing  the  behavior  of  the  aggregate. 

These  rules,  then,  constitute  the  generalized  genotype  (GTYPE)  of  the  Boids 
system.  They  say  nothing  about  structure,  or  growth  and  development,  but  they 
determine  the  behavior~of  a  set  of  interacting  objects,  out  of  which  very  natural 
motion  emerges. 

With  the  right  settings  for  the  parameters  of  the  system,  a  collection  of  Boids 
released  at  random  positions  within  a  volume  will  collect  into  a  dynamic  flock, 
which  flies  around  environmental  obstacles  in  a  very  fluid  and  natural  manner, 
occasionally  breaking  up  into  sub-flocks  as  the  flock  flows  around  both  sides  of  an 
obstacle.  Once  broken  up  into  sub-flocks,  the  sub-flocks  reorganize  around  their 
own,  now  distinct  and  isolated  centers  of  mass,  only  to  re-merge  into  a  single  flock 
again  when  both  sub-flocks  emerge  at  the  f2ir  side  of  the  obstacle  and  each  sub-flock 
feels  anew  the  “mass”  of  the  other  sub-flock  (Figure  7). 

The  flocking  behavior  itself  constitutes  the  generalized  phenotype  (PTYPE) 
of  the  Boids  system.  It  bears  the  same  relation  to  the  GTYPE  as  an  organism’s 
morphological  phenotype  bears  to  its  molecular  genotype.  The  same  distinction 
between  the  specification  of  machinfiry  and  the  behavior  of  machinery  is  evident. 


6.4  DISCUSSION  OF  EXAMPLES 

In  all  of  the  above  examples,  the  recursive  rules  apply  to  local  structures  only,  and 
the  PTYPE — structural  or  behavioral — that  results  at  the  global  level  emerges  out 
of  all  local  activity  taken  collectively.  Nowhere  in  the  system  are  there  rules  for 
the  behavior  of  the  system  at  the  globed  level.  This  is  a  much  more  powerful  and 
simple  approach  to  the  generation  of  complex  behavior  than  that  typically  taken  in 
AI,  for  instance,  where  “expert  systems”  attempt  to  provide  global  rules  for  global 
behavior.  Recursive,  “bottom  up”  specifications  yield  much  more  natural,  fluid,  and 
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flexible  behavior  at  the  global  level  than  typical  “top  down'’  specifications,  and  they 
do  so  much  more  parsimoniously. 

IMPORTANCE  OF  CONTEXT  SENSITIVITY.  It  is  worthwhile  to  note  that  context- 
sensitive  rules  in  GTYPE/PTYPE  systems  provide  the  possibility  for  nonlinear 
interactions  among  the  parts.  Without  context  sensitivity,  the  systems  would  be 
linearly  decomposable,  information  could  not  “flow”  throughout  the  system  in  any 
meaningful  manner,  and  complex  long-range  dependencies  between  remote  parts  of 
the  structures  could  not  develop. 

FEEDBACK  BETWEEN  THE  LOCAL  AND  THE  GLOBAL  LEVELS.  There  is  also  a  very 
important  feedback  mechanism  between  levels  in  such  systems:  the  interactions 
among  the  low-level  entities  give  rise  to  the  global-level  dynamics  which,  in  turn,  af¬ 
fects  the  lower  levels  by  setting  the  local  context  within  which  each  entity’s  rules  are 
invoked.  Thus,  local  behavior  supports  global  dynamics,  which  shapes  local  context, 
which  affects  local  behavior,  which  supports  global  dynamics,  and  so  forth. 


6.5  GENUINE  LIFE  IN  ARTIFICIAL  SYSTEMS 

It  is  important  to  distinguish  the  ontological  status  of  the  various  levels  of  behavior 
in  such  systems.  At  the  level  of  the  individual  behavors,  we  have  a  clear  differ¬ 
ence  in  kind:  Boids  are  not  birds,  they  are  not  even  remotely  like  birds,  they  have 
no  cohesive  physical  structure,  but  rather  they  exist  as  information  structures — 
processes — within  a  computer.  But — and  this  is  the  critical  “But”— at  the  level 
of  behaviors,  flocking  Boids  and  flocking  birds  are  two  instances  of  the  same  phe¬ 
nomenon:  flocking. 

The  behavior  of  a  flock  as  a  whole  does  not  depend  critically  on  the  internal 
details  of  the  entities  of  which  it  is  constituted,  only  on  the  details  of  the  way 
in  which  these  entities  behave  in  each  other’s  presence.  Thus,  flocking  in  Boids  is 
true  flocking,  and  may  be  counted  as  another  empirical  data  point  in  the  study 
of  flocking  behavior  in  general,  right  up  there  with  flocks  of  geese  and  flocks  of 
starlings. 

This  is  not  to  say  that  flocking  Boids  capture  all  the  nuances  upon  which 
flocking  behavior  depends,  or  that  the  Boid’s  behavioral  repertoire  is  sufficient  to 
exhibit  all  the  different  modes  of  flocking  that  have  been  observed — such  as  the 
classic  “V”  formation  of  flocking  geese.  The  crucied  point  is  that  we  have  captured, 
within  an  aggregate  of  artificial  entities,  a  bona  fide  lifelike  behavior,  and  that  the 
behavior  emerges  within  the  artificial  system  in  the  same  way  that  it  emerges  in 
the  natural  system. 

The  same  is  true  for  L-systems  and  the  self-reproducing  loops.  The  constituent 
parts  of  the  artificial  systems  are  different  kinds  of  things  from  their  natural  counter¬ 
parts,  but  the  emergent  behaviors  that  they  support  are  the  same  kinds  of  thing  as 
their  natural  counterparts:  genuine  morphogenesis  and  differentiation  for  L-systems, 
and  genuine  self- reproduction  in  the  case  of  the  loops. 
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The  claum  is  the  following.  The  “artificial”  in  Artificial  Life  refers  to  the  com¬ 
ponent  parts,  not  the  emergent  processes.  If  the  component  parts  are  implemented 
correctly,  the  processes  they  support  are  genuine — every  bit  as  genuine  as  the  nat- 
ureil  processes  they  imitate. 

The  big  claim  is  that  a  properly  organized  set  of  artificial  primitives  carrying 
out  the  same  functional  roles  as  the  biomolecules  in  natural  living  systems  will 
support  a  process  that  will  be  “alive”  in  the  same  way  that  natural  organisms  are 
alive.  Artificial  Life  will  therefore  be  genuine  life — it  will  simply  be  made  of  different 
stuff  than  the  life  that  has  evolved  here  on  Ecirth. 


7.  EVOLUTION 

7.1  EVOLUTION:  FROM  ARTIFICIAL  SELECTION  TO  NATURAL  SELECTION 

Modern  organisms  owe  their  structure  to  the  complex  process  of  biological  evolu¬ 
tion,  and  it  is  very  difficult  to  discern  which  of  their  properties  are  due  to  chance 
and  which  to  necessity.  If  biologists  could  “rewind  the  tape”  of  evolution  and  start  it 
over,  again  and  again,  from  different  initial  conditions,  or  under  different  regimes  of 
external  perturbations  along  the  wav.  they  would  have  a  full  ensemble  of  evolution¬ 
ary  pathways  to  generalize  over.  Such  an  ensemble  would  allow  them  to  distinguish 
universal,  necessary  properties  (those  which  were  observed  in  all  the  pathways  in 
the  ensemble)  from  accidental,  chance  properties  (those  which  were  unique  to  in¬ 
dividual  pathways).  However,  biologists  cannot  rewind  the  tape  of  evolution,  zmd 
are  stuck  with  a  single,  actual  evolutionary  trace  out  of  a  vast,  intuited  ensemble 
of  possible  traces. 

Although  studying  computer  models  of  evolution  is  not  the  same  as  studying 
the  “real  thing,”  the  ability  to  freely  manipulate  computer  experiments — to  “rewind 
the  tape,”  perturb  the  initial  conditions,  and  so  forth — can  more  than  make  up  for 
their  “lau:k”  of  reality. 

It  has  been  known  for  some  time  that  one  can  evolve  computer  progr?ims  by 
the  process  of  natural  selection  anSong  a  population  of  variant  progreuns.  Each 
individual  program  in  a  population  of  programs  is  evaluated  for  its  performzmce 
on  some  task.  The  programs  that  perform  best  are  allowed  to  “breed”  with  one 
another  via  Genetic  Algorithms.’^'^^  The  offspring  of  these  better-performing  parent 
programs  replace  the  worst-performing  programs  in  the  population,  and  the  cycle  is 
iterated.  Such  evolutionary  approaches  to  program  improvement  have  been  applied 
primarily  to  the  tasks  of  function  optimization  and  machine  learning. 

However,  such  evolutionary  models  have  rarely  been  used  to  study  evolution 
'tself.^^  Researchers  have  primarily  concentrated  on  the  results,  rather  than  on  the 
process,  of  evolution.  In  the  spirit  of  von  Neumann’s  research  on  self-reproduction 
via  the  study  of  self-reproducing  automata,  the  following  sections  review  studies  of 
♦  he  process  of  evolution  by  studying  evolving  populations  of  “automata.” 
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7.2  ENGINEERING  PTYPES  FROM  GTYPES 

In  the  preceding  sections,  we  have  mentioned  several  times  the  formal  impossibility 
of  predicting  the  behavior  of  an  arbitrary  machine  by  mere  inspection  of  its  spec¬ 
ification  and  initial  state.  In  the  general  case,  we  must  run  a  meichine  in  order  to 
determine  its  behavior. 

The  consequence  of  this  unpredictability  for  GTYPE/PTYPE  systems  is  that 
we  cannot  determine  the  PTYPE  that  will  be  produced  by  an  arbitrary  GTYPE 
by  inspection  alone.  We  must  “run  ’  the  GTYPE  in  the  context  of  a  specific  envi¬ 
ronment,  and  let  the  PTYPE  develop  in  order  to  determine  the  resulting  structure 
and  its  behavior. 

This  is  even  further  complicated  when  the  environment  consists  of  a  population 
of  PTYPES  engaged  in  nonlinear  interactions,  in  which  case  the  determination  of 
a  PTYPE  depends  on  the  behavior  of  the  specific  PTYPES  it  is  interacting  with, 
and  on  the  emergent  details  of  the  global  dynamics. 

Since,  for  any  interesting  system,  there  will  exist  an  enormous  number  of  po¬ 
tential  GTYPES,  and  since  there  is  no  formal  method  for  deducing  the  PTYPES 
from  the  GTYPES,  how  do  we  go  about  finding  GTYPES  that  will  generate  lifelike 
PTYPES?  Or  PTYPES  that  exhibit  any  other  particular  sought-after  behavior? 

Until  now,  the  process  has  largely  been  one  of  guessing  at  appropriate 
GTYPES,  and  modifying  them  by  trial  and  error  until  they  generate  the  appropri¬ 
ate  PTYPES.  However,  this  process  is  limited  by  our  preconceptions  of  what  the 
appropriate  PTYPES  would  be,  and  by  our  restricted  notions  of  how  to  generate 
GTYPES.  We  would  like  to  be  able  to  automate  the  process  so  that  our  pre¬ 
conceptions  and  limited  abilities  to  conceive  of  machinery  do  not  overly  constrain 
the  search  for  GTYPES  that  will  yield  the  appropriate  behaviors. 


7.3  NATURAL  SELECTION  AMONG  POPULATIONS  OF  VARIANTS 

Nature,  of  course,  hais  had  to  face  the  same  problem,  and  has  hit  upon  an  elegant 
solution:  evolution  by  the  process  of  natural  selection  among  populations  of  variants. 
The  scheme  is  a  very  simple  one.  However,  in  the  face  of  the  formal  impossibility 
of  predicting  behavior  from  machiiTe  description  alone,  it  may  well  be  the  only 
efficient,  general  scheme  for  searching  the  space  of  possible  GTYPES. 

The  mechanism  of  evolution  is  as  follows.  A  set  of  GTYPES  is  interpreted 
within  a  specific  environment,  forming  a  population  of  PTYPES  which  interact 
with  one  another  and  with  features  of  the  environment  in  various  complex  ways.  On 
the  basis  of  the  relative  performance  of  their  associated  PTYPES,  some  GTYPES 
are  duplicated  in  larger  numbers  than  others,  and  they  are  duplicated  in  such  a  way 
that  the  copies  are  similar  to— but  not  exactly  the  same  as — the  originals.  These 
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FIGURE  8  The  process  of  evolution  by  natural  selection.  Original  figure  appeared 
in  “Artificial  Life”  by  Christopher  Langton,  in  Artificial  Life,  edited  by  C.  Langton 
(Addison-Wesley,  Redwood  City,  1989). 


variant  GTYPES  develop  into  variant  PTYPES,  which  enter  into  the  complex 
interactions  within  the  environment,  and  the  process  is  continued  ad  infinitum 
(Figure  8).  As  expected  from  the  formal  limitations  on  predictability,  GTYPES 
must  be  “run”  (i.e.,  turned  into  PTYPES)  in  an  environment  and  their  behaviors 
must  be  evaluated  explicitly,  their  implicit  behavior  cannot  be  determined. 


7.4  GENETIC  ALGORITHMS 

In  the  spirit  of  von  Neumann,  John  Holland  has  attempted  to  abstract  “the  logical 
form”  of  the  natural  process  of  biological  evolution  in  what  he  calls  the  “Genetic 
Algorithm”  (GA).^*'^^  In  the  GA,  a  GTYPE  is  represented  as  a  character  string 
that  encodes  a  potential  solution  to  a  problem.  For  instance,  the  character  string 
might  encode  the  weight  matrix  of  a  neural  network,  or  the  transition  table  of 
a  finite  state  machine.  These  character  strings  are  rendered  as  PTYPES  via  a 
problem-specific  interpreter,  which  constructs,  for  example,  the  neural  net  or  finite 
state  machine  specified  by  each  GTVPE,  evaluates  its  performance  in  the  problem 
domain,  and  provides  it  with  a  specific  fitness  value,  or  “strength.” 

The  GA  implements  natural  selection  by  making  more  copies  of  the  character 
strings  representing  the  better  performing  PTYPES.  The  GA  generates  variant 
GTYPES  by  applying  genetic  operators  to  these  character  strings.  The  genetic 
operators  typically  consist  of  reproduction,  crossover,  and  mutation,  with  occasional 
usage  of  inversion  and  duplication. 

Recently,  John  Koza^®  has  developed  a  version  of  the  GA,  which  he  calls  the  Ge¬ 
netic  Programming  Paradigm  (GPP),  that  extends  the  genetic  operators  to  work 
on  GTYPES  that  are  simple  expressions  in  a  standard  programming  language. 
The  GPP  differs  from  the  traditional  GA  in  that  these  program  expressions  are 


Artificial  Life 


221 


(a)  11001010111010110101 000 101011011101011 

(b)  (OR  (NOT  a)  (AND  b  c)  ) 


NOT 

a 


AND 


FIGURE  9  GTYPES  in  the  GA  and  GPP  paradigms. 


not  represented  as  simple  character  strings  but  rather  as  the  parse  trees  of  the 
expressions.  This  makes  it  easier  for  the  genetic  operators  to  obey  the  syntax  of  the 
programming  language  when  producing  variant  GTYPES. 

Figure  9  shows  some  examples  of  GTYPES  in  the  GA  and  GPP  paradigms. 

THE  GENETIC  OPERATORS  The  genetic  operators  work  as  follows. 

Reproduction  is  the  most  basic  operator.  It  is  often  implemented  in  the  form 
of  fitness  proportionate  reproduction,  which  means  that  strings  are  duplicated  in 
direct  proportion  to  their  relative  fitness  values.  Once  all  strings  have  been  eval¬ 
uated,  the  average  fitness  of  the  population  is  computed,  and  those  strings  whose 
fitness  is  higher  than  the  populatioQ  average  have  a  higher  probability  of  being 
duplicated,  while  those  strings  whose  fitness  is  lower  than  the  population  average 
have  a  lower  probability  of  being  duplicated.  There  are  many  variations  on  this 
scheme,  but  most  implementations  of  the  GA  or  the  GPP  use  some  form  of  fitness 
proportionate  reproduction  eis  the  means  to  implement  “selection.”  Another  form 
of  this  is  to  simply  keep  the  top  10%  or  so  of  the  population  and  throw  away  the 
rest,  using  the  survivors  as  breeding  stock  for  the  next  generation. 

Mutation  in  the  GA  is  simply  the  replacement  of  f'ne  or  more  chairacters  in 
a  character  string  GTYPE  with  another  character  picked  at  random.  In  binary 
strings,  this  simply  amounts  to  random  bit  flips.  In  the  GPP,  mutation  is  imple¬ 
mented  by  picking  a  sub-tree  of  the  parse  tree  at  random,  and  replacing  it  with  a 
randomly  generated  sub-tree  whose  root  node  is  of  the  same  syntactic  type  as  the 
root  node  of  the  replaced  sub-tree. 


222 


Christopher  G.  Langton 


(a)  looioinoj 

101 I0I001 


lOIOOIOI 

bioooiojol 


010 10011  I 
0000010101 


0^1 


10101 

11010 


op 


11 1 1 
1 1 1 


1 00  1 0  1  1  1 0| 
101101001 


Ip  1 000 1 0 1  0| 
1010010 


010100111 

0000010101 


0^1 


1 0  1 00| 
110101 


1111 

pill 


FIGURE  10  Crossover  operation  in  the  GA  and  GPP. 


Crossover  is  an  analog  of  sexual  recombination.  In  the  GA,  this  is  accom¬ 
plished  by  picking  two  “parent”  character  strings,  lining  them  up  side-by-side,  and 
interchanging  equivalent  sub-strings  between  them,  producing  two  new  sub-strings 
that  each  contain  a  mix  of  their  pzu-ent’s  genetic  information.  Crossover  is  an  ex¬ 
tremely  important  genetic  operator.  Whereas  mutation  is  equivalent  to  random 
search,  crossover  allows  the  more  “intelligent”  search  strategy  of  putting  things 
that  have  proved  useful  in  new  combinations. 

In  the  GPP,  crossover  is  implemented  by  picking  two  “parent”  parse  trees, 
locating  syntactically  similar  sub-trees  within  each,  and  swapping  them. 

Figure  10  illustrates  the  crossover  operation  in  the  GA  and  GPP. 
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Inversion  is  used  rarely  in  order  to  rearrange  the  relative  locations  of  specific 
pieces  of  genetic  information  in  the  character  strings  of  the  GA. 

Duplication  is  sometimes  used  in  situations  where  it  makes  sense  for  the 
genome  to  grow  in  length,  representing,  for  instance,  larger  neural  nets,  or  bigger 
finite  state  machine  transition  tables. 

THE  OPERATION  OF  THE  GENETIC  ALGORITHM  The  basic  outline  of  the  genetic 
algorithm  is  as  follows: 

1.  Generate  a  random  initial  population  of  GTYPES. 

2.  Render  the  GTYPES  in  the  population  as  PTYPES  and  evaluate  them  in  the 
problem  domain,  providing  each  GTYPE  with  a  fitness  value. 

3.  Duplicate  GTYPES  according  to  their  relative  fitness  using  a  scheme  like  fitness 
proportionate  reproduction. 

4.  Apply  genetic  operators  to  the  GTYPES  in  the  population,  typically  picking 
crossover  partners  as  a  function  of  their  relative  fitness. 

5.  Replace  the  least-fit  GTYPES  in  the  population  with  the  offspring  generated 
in  the  last  several  steps. 

6.  Go  back  to  step  2  and  iterate. 

Although  quite  simple  jn  outline,  the  genetic  algorithm  has  proved  remcirkably 
powerful  in  a  wide  variety  of  applications,  and  provides  a  useful  tool  for  both  the 
study  and  the  application  of  evolution. 

THE  CONTEXT  OF  ADAPTATION  GA’s  have  traditionally  been  employed  in  the  con¬ 
texts  of  machine  learning  and  function  optimization.  In  such  contexts,  one  is  often 
looking  for  an  explicit,  optimal  solution  to  a  particular,  well-specified  problem.  This 
is  reflected  in  the  implementation  of  the  evaluation  of  PTYPES  in  traditional  GA’s: 
each  GTYPE  is  expressed  as  a  PTYPE  independently  of  the  others,  tested  on  the 
problem,  and  assigned  a  value  representing  its  individual  fitness  using  an  explicit 
fitness  function.  Thus,  one  is  often  seeking  to  evolve  an  individual  that  explicitly  en¬ 
codes  an  optimal  solution  to  a  precisely  specified  problem.  The  fitness  of  a  GTYPE 
in  such  cases  is  simply  a  function  of  the  problem  domain,  auid  is  independent  of  the 
fitnesses  of  the  other  GTYPES  in  the  population. 

This  is  quite  different  from  the  context  in  which  natural  biological  evolution 
has  taken  place,  in  which  the  behavior  of  a  PTYPE  and  its  associated  fitness  are 
highly  dependent  on  which  other  PTYPES  exist  in  the  environment,  and  on  the 
dynamics  of  their  interactions.  Furthermore,  in  the  natural  context,  it  is  generally 
the  case  that  there  is  no  single,  explicitly  specified  problem  confronting  the  pop¬ 
ulation.  Rather,  there  is  often  quite  a  large  set  of  problems  fcicing  the  population 
at  any  one  time,  and  these  problems  are  only  implicitly  determined  as  a  function 
of  the  dynamics  of  the  population  and  the  environment  themselves,  which  may 
change  significantly  over  time.  In  such  a  context,  nature  has  often  discovered  that 
the  collective  behavior  emerging  from  the  interactions  among  a  set  of  PTYPES  will 
address  a  subset  of  the  implicitly  defined  problems. 
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Thus,  the  proper  picture  for  the  natural  evolutionary  context  is  that  of  a  large 
cloud  of  implicit  collective  solutions  addressing  a  large  cloud  of  implicit  collective 
problems.  Both  of  these  clouds  are  implicit  in  the  spatio-temporal  dynamics  of  the 
population. 

The  dynamics  of  such  systems  are  very  complex  and  impossible  to  predict.  One 
can  think  of  them  as  the  dynamical  equivalent  of  many-body  orbital  mechanics 
problems;  two-body  problems  can  be  treated  analytically,  whereas  three-  or  more 
body  problems  are  nonanalytic. 

The  important  point  here  is  that  nonlinearities  and  emergent  collective  phe¬ 
nomena  are  properties  that  are  to  be  exploited,  rather  than  avoided  as  has  been 
the  traditional  engineering  viewpoint.  Emergent  nonlinear  solutions  may  be  harder 
to  understand  or  to  engineer,  but  there  are  far  more  of  them  than  there  are  none- 
mergent,  analyzable  linear  solutions.  The  true  power  of  evolution  lies  in  its  ability 
to  exploit  emergent  collective  phenomena;  it  lies,  in  fact,  in  evolution’s  inability  to 
avoid  such  phenomena. 


7.5  FROM  ARTIFICIAL  SELECTION  TO  NATURAL  SELECTION 

In  The  Origin  of  Species,  Darwin  used  a  very  clever  device  to  argue  for  the  agency  of 
natural  selection.  In  the  first  chapter  of  Origin,  Darwin  lays  the  groundwork  of  the 
case  for  naiura/ selection  by  carefully  documenting  the  process  of  aritficial selection. 
Most  people  of  his  time  were  familiar  with  the  manner  in  which  breeders  of  domestic 
animals  and  plants  could  enhance  traits  arbitrarily  by  selective  breeding  of  their 
stock.  Darwin  carefully  made  the  case  that  the  wide  variety  of  domestic  animals 
and  plants  extant  at  his  time  were  descended  from  a  much  smaller  variety  of  wild- 
stock,  due  to  the  selective  breedings  imposed  by  farmers  and  herders  throughout 
history. 

Now,  Darwin  continues,  simply  note  that  environmental  circumstances  can  fill 
the  role  played  by  the  human  breeder  in  artificial  selection,  and  voila!  one  has  nat¬ 
ural  selection.  The  rest  of  the  book  consists  in  a  very  careful  documentation  of  the 
manner  in  which  different  environmental  conditions  would  favor  animals  bearing 
different  traits,  making  it  more  likgly  that  individuals  bearing  those  traits  would 
survive  to  mate  with  each  other  and  produce  offspring,  leading  to  the  gradual  en¬ 
hancement  of  those  traits  through  time.  A  beautifully  simple  yet  elegant  mechanism 
to  explain  the  origin  and  maintenance  of  the  diversity  of  species  on  Earth — too  sim¬ 
ple  for  many  of  his  time,  particularly  those  of  strong  religious  persuasion. 

The  abstraction  of  this  simple  elegant  mechanism  for  the  production  and  fil¬ 
tration  of  diversity  in  the  form  of  the  Genetic  Algorithm  is  straightforward  auid 
obvious.  However,  as  it  is  usually  implemented,  it  is  artificial,  rather  than  natural, 
selection  that  is  the  agency  determining  the  direction  of  computer  evolution.  Ei¬ 
ther  we  ourselves,  or  our  algorithmic  agents  in  the  form  of  explicit  fitness  functions, 
typically  stand  in  the  role  of  the  breeder  in  computer  implementations  of  evolution. 
Yet  it  is  plain  that  the  role  of  “breeder”  can  as  easily  be  filled  by  “nature”  in  the 
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world  inside  the  computer  as  it  is  in  the  world  outside  the  computer — it  is  just  a 
different  “nature.” 

In  the  following  sections,  we  will  explore  a  number  of  examples  of  computa¬ 
tional  implementations  of  the  «'volutionary  process,  starting  with  examples  that 
clearly  involve  artificial  selection,  and  working  our  way  through  to  an  example  that 
clearly  involves  natural  selection.  The  key  thing  to  keep  track  of  throughout  these 
examples  is  the  manner  in  which  we  incrementally  give  over  our  role  as  breeder  to 
the  “natural”  pressures  impos<'d  by  the  dynamics  of  the  computational  world  itself. 

A  BREEDER’S  PARADISE;  BIOMORPHS  The  first  model,  a  clear-cut  example  of  com¬ 
putational  artificial  selection,  is  due  to  the  Oxford  evolutionary  biologist,  Richard 
Dawkins,  author  of  such  highly  regarded  books  as  The  Selfish  Gene.  The  Extended 
Phenotype,  and  The  Blind  Watchmaker. 

In  order  to  illustrate  the  power  of  a  process  in  which  the  random  production 
of  variation  is  coupled  with  a  selection  mechanism,  Dawkins  wrote  a  program  for 
the  Apple  Macintosh  cornpiiU'r  that  allows  users  to  “breed”  recursively  generated 
objects. 

The  program  is  set  up  to  generate  tree  structures  recursively  by  starting  with 
a  single  stem,  adding  branches  to  it  in  a  certain  way,  adding  branches  to  those 
branches  in  the  same  way, .and  so  on.  The  number  of  branches,  their  angles,  their 
size  relative  to  the  stem  they  are  being  added  to.  the  number  of  branching  iterations, 
and  other  parameters  affecting  the  growth  of  these  trees  are  what  constitute  the 
GTYPES  of  the  tree  organisms — t;)r  “biomorphs”  as  Dawkins  calls  them.  Thus, 
the  program  consists  of  a  general  purpose  recursive  tree  gener.ator.  which  takes 
an  organism's  GTYPE  (parameter  settings)  as  data  and  generates  its  associated 
PTYPE  (the  resulting  tree). 

The  program  starts  by  producing  a  simple  default  —or  “.\dam” — tree  and  then 
produces  a  number  of  mutated  copies  of  the  parameter  string  for  the  Adam  tree. 
The  program  renders  the  PTYPE  trees  for  all  of  these  different  mutants  on  the 
screen  for  the  user  to  view.  The  u.ser  then  .selects  the  PTYPE  (i.e.,  tree  shape) 
he  or  she  likes  the  best,  and  the  program  produces  mutated  copies  of  t'lat  tree's 
GTYPE,  and  renders  the  a.ssociated  F’T’^'PES.  The  user  selects  another  tree,  and 
the  process  continues.  The  original  Ad'arn  tree  together  with  a  number  its  distant 
descendants  are  shown  in  Figure  IT 

It  is  clear  that  this  is  a  process  of  artificial  selection.  The  computer  generates  the 
variants,  but  the  human  user  fills  the  role  of  the  “breeder,”  t  he  active  .selective  agent, 
determining  which  structures  are  to  go  on  to  produce  variant  offspring.  However, 
the  mechanics  of  the  production  of  variants  are  particularly  clear:  produce  slight 
variations  on  the  presently  selected  GTYPE.  The  specific  action  taken  by  the  human 
breeder  is  also  very  clear:  choose  the  PT'^'PE  whose  GTYPE  will  have  variations  of 
it  produced  in  the  next  round.  There  is  both  a  producer  and  a  selector  of  variation. 
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FIGURE  1 1  (a)  Dawkin’s  original  Adam  tree,  and  (b)  a  number  of  its  distant 
descendants.® 


ALGORITHMIC  BREEDERS  In  this  section,  we  investigate  a  model  which  will  take 
us  two  steps  closer  to  natural  selection.  First,  the  human  breeder  is  taken  cut  of 
the  loop,  replaced  by  a  program  he  writes,  which  formalizes  his  selection  criteria, 
so  that  the  act  of  selection  can  be  performed  by  his  computational  agent.  Second, 
we  see  that  our  computational  representative  can  itself  be  allowed  to  evolve — an 
important  first  step  toward  eliminating  our  externally  imposed,  a  pnort  criteria 
from  the  process  completely. 

The  system  we  discuss  here  is  tlue  to  Danny  Hillis,  inventor  of  the  Connection 
Machine  and  chief  scientist  of  Thinking  Machines  Corporation.  In  the  course  of  the 
work  at  TMC,  they  have  a  need  to  design  fast  and  efficient  chips  for  the  hardware 
implementation  of  a  wide  variety  of  common  computational  tasks,  such  as  sorting 
numbers.  For  many  of  these,  there  is  no  body  of  theory  that  tells  engineers  how  to 
construct  the  optimal  circuit  to  perform  the  task  in  question.  Therefore,  progress 
in  the  design  of  such  circuits  is  often  a  matter  of  blind  trial  and  error  until  a 
better  circuit  is  discovered.  Hillis  decided  to  apply  the  trial-and-error  procedure  of 
evolution  to  the  problem  of  designing  sorting  circuits. 

In  his  system,  the  GTYPES  are  strings  pf  numbers  encoding  circuit  connections 
that  implement  comparisons  and  swaps  between  input  lines.  GTV TES  are  rendered 
into  the  specific  circuits  they  encode — their  PTYPES — and  they  are  rated  accord¬ 
ing  to  the  number  of  circuit  elements  and  connections  they  require,  rnd  by  their 
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performance  on  a  number  of  test  strings  which  they  have  to  sort.  This  rating  is  ac¬ 
complished  by  an  explicit  fitness  function — Hillis’  computational  representative — 
which  implements  the  selection  criteria  and  taJces  care  of  the  breeding  task.  Thus, 
this  is  still  a  case  of  artificial  selection,  even  though  there  is  no  human  being  actively 
doing  the  selection. 

Hillis  implemented  the  evolution  problem  on  his  Connection  Machine  CM‘2 — 
a  64K  processor  SIMD  parallel  supercomputer.  With  populations  of  64K  sorting 
networks  over  thousands  of  generations,  the  system  managed  to  produce  a  65- 
element  sorter,  better  than  some  cleverly  engineered  sorting  networks,  but  not  as 
good  as  the  best  known  such  network,  which  has  60  components.  After  reaching 
65-element  sorters,  the  system  consistently  became  stuck  on  local  optima. 

Hillis  then  borrowed  a  trick  from  the  biological  literature  on  the  coevolution 
of  hosts  and  parasites  (specifically  Hamilton®' and  in  the  process  took  a  step 
closer  to  natural  selection  by  allowing  the  evaluation  function  to  evolve  in  time. 
In  the  previous  runs,  the  sorting  networks  were  evaluated  on  a  fixed  set  of  sorting 
problems — random  sequences  of  numbers  that  the  networks  had  to  sort  into  correct 
order.  In  the  new  set  of  runs,  Hillis  made  another  evolving  population  out  of  the 
sorting  problems.  The  task  for  the  sorting  networks  was  to  do  a  good  job  on  the 
sorting  problems,  while  the  taisk  for  the  sorting  problems  was  to  make  the  sorting 
networks  perform  poorly. 

In  this  situation,  whenever  a  good  sorting  network  emerged  and  took  over  the 
population,  it  became  a  target  for  the  population  of  sorting  problems.  This  led 
to  the  rapid  evolution  of  sorting  sequences  that  would  make  the  network  perform 
poorly  and  hence  reduce  its  fitness.  Hillis  found  that  this  coevolution  between  the 
sorting  networks  and  the  sorting  problems  led  much  more  rapidly  to  better  solutions 
than  had  been  achieved  by  the  evolution  of  sorting  networks  alone,  resulting  in  a 
sorting  network  consisting  of  61  elements. 

It  is  the  coevolution  in  this  latter  set  of  runs  that  both  bring  us  one  step  closer 
to  natural  selection  and  is  responsible  for  the  enhanced  efficiency  of  the  search  for 
an  optimal  sorting  network.  First  of  all,  rather  than  having  an  absolute,  fixed  value, 
the  fitness  of  a  sorting  network  depends  on  the  specific  set  of  sorting  problems  it 
is  facing.  Likewise,  the  fitness  of  a  set  of  sorting  problems  depends  on  the  specific 
set  of  sorting  networks  it  is  facing.  Thus,  the  “fitness”  of  an  individual  is  now  a 
relative  quantity,  not  an  absolute  one.  The  fitness  function  depends  a  little  more 
on  the  “nature”  of  the  system,  it  is  an  evolving  entity  as  well. 

Coevolution  increases  the  efficiency  of  the  sezirch  as  follows.  In  the  earlier  runs 
consisting  solely  of  an  evolving  population  of  sorting  networks,  the  population  of 
networks  was  effectively  hill  climbing  on  a  multi-peaked  fitness  landscape.  There¬ 
fore,  the  populations  would  encounter  the  classic  problem  of  getting  stuck  on  local 
maxima.  That  is,  a  population  could  reach  certain  structures  which  lie  on  relatively 
low  fitness  peaks,  but  from  which  any  deviations  result  in  lower  fitness,  which  is 
selected  against.  In  order  to  find  another,  higher  peak,  the  population  would  have 
to  cross  a  fitness  valley,  which  it  is  difficult  to  do  under  simple  Darwinian  selection 
(Figure  13(a)). 
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FIGURE  12  An  evolved  sorting  network  showing  sequencing  of  comparisons  and 
swaps.  Original  figure  appeared  in  "Co-Evolving  Parasites  Improve  Simulated  Evolution 
as  an  Optimization  Procedure"  by  W.  D.  Hillis,  in  Artificial  Life  II,  edited  by  C.  G. 
Langton,  C.  Taytor,  J.  D.  Farmer,  and  S.  Rasmussen,  (Redwood  City,  CA:  Addison- 
Wesley,  1991).^^ 


In  the  coevolutionary  case,  here’s  what  happens  (Figure  13(b)).  When  a  popu¬ 
lation  of  sorting  networks  gets  stuck  on  a  local  fitness  peeik,  it  becomes  a  target  for 
the  population  of  sorting  problems.  That  is,  it  defines  a  new  peak  for  the  sorting 
problems  to  climb.  As  the  sorting  problems  climb  their  peak,  they  drive  down  the 
peak  on  which  the  sorting  networks  are  sitting,  by  finding  sequences  that  make  the 
sorting  networks  perform  poorly,  therefore  lowering  their  fitness.  After  a  while,  the 
fitness  peak  that  the  sorting  networks  were  sitting  on  has  been  turned  into  a  fitness 
valley,  from  which  the  population  can  escape  by  climbing  up  the  neighboring  peaks. 
As  the  sorting  networks  climb  other  peaks,  they  drive  down  the  peak  that  they  had 
provided  for  the  sorting  problems,  which  will  then  chase  the  sorting  networks  to 
the  new  peaks  they  have  achieved  and  drive  those  down  in  turn. 

In  short,  each  population  dynamically  deforms  the  fitness  landscape  being  tra¬ 
versed  by  the  other  population  in  such  a  way  that  both  populations  can  continue  to 
climb  uphill  without  getting  stuck  on  locjJ  maxima.  When  they  do  get  stuck,  the 
maxima  get  turned  into  minima  which  can  be  climbed  out  of  by  simple  Darwinian 
means.  Thus,  coupled  populations  evolving  by  Darwinian  means  can  bootstrap  each 
other  up  the  evolutionary  ladder  far  more  efficiently  than  they  can  climb  it  alone.  By 
competing  with  one  another,  coupled  populations  improve  one  another  at  increased 
rates. 
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FIGURE  13  (a)  Population  of  sorting  networks  stuck  on  a  local  fitness  peak.  In  order 
to  attain  the  higher  peak,  the  population  must  cross  a  fitness  “valley,"  which  is  difficult 
to  achieve  under  normal  Darwinian  mechanisms,  (b)  The  coevolving  parasites  deform 
the  fitness  landscape  of  the  sorting  networks,  turning  the  fitness  peak  into  a  fitness 
valley,  from  which  it  is  easy  for  the  population  to  escape. 


TABLE  1  The  payoff  matrix  for  the  Prisoner’s  Dilemma 
Game.  The  pair  (si ,  S2)  denotes  the  scores  to  play¬ 
ers  A  and  B,  respectively. 

Player  B 

Cooperate  Defect 
Cooperate  (3,3)  (0,5) 


Player  A 


Defect 


(5,0)  (1,1) 
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Thus,  when  coupled  in  this  way,  a  population  may  get  hung  up  on  local  optima 
for  a  while,  but  eventually  it  will  be  able  to  climb  again.  This  suggests  immediately 
that  the  structure  of  the  evolutionary  record  for  such  systems  should  show  periods 
of  stasis  followed  by  periods  of  evolutionary  change.  The  stasis  comes  about  as 
populations  sit  at  the  top  of  local  fitness  peaks,  waiting  around  for  something  to 
come  along  and  do  them  the  favor  of  lowering  the  peaks  they  are  stuck  on.  The 
periods  of  change  come  about  when  populations  are  released  from  local  optima  and 
are  freed  to  resume  climbing  up  hills,  and  are  therefore  changing  in  time.  Hillis  has, 
in  fact,  carefully  documented  this  kind  of  Punctuated  Equilibria  in  his  system. 

COMPUTATIONAL  ECOLOGIES  Continuing  on  our  path  from  artificial  to  natural 
selection,  we  turn  to  a  research  project  carried  out  by  Kristian  Lindgren,^^  in  which, 
although  there  is  still  an  explicit  fitness  measure,  many  different  species  of  organisms 
coevolve  in  each  other’s  presence,  forming  ecological  webs  allowing  for  more  complex 
interactions  than  the  simple  host-parasite  interactions  described  above. 

In  this  paper,  Lindgren  studies  evolutionary  dynamics  within  the  context  of  a 
well-known  game-theoretic  problem:  the  Iterated  Prisoner’s  Dilemma  model  (IPD). 
This  model  has  been  used  effectively  by  Axelrod  and  Hamilton  in  their  studies  of 
the  evolution  of  cooperation.^’^ 

In  the  prisoner’s  difemma  model,  the  payoff  matrix  (the  fitness  function)  is 
constructed  in  such  a  way  that  individuals  will  garner  the  most  payoff  collectively 
in  the  long  run  if  they  “cooperate”  with  one  another  by  avoiding  the  behaviors  that 
would  garner  them  the  most  payoff  individually  in  the  short  run.  If  individuals  only 
play  the  game  once,  they  will  do  best  by  not  cooperating  (“defecting”).  However, 
if  they  play  the  game  repeatedly  with  one  another  (the  "iterated”  version  of  the 
game),  they  will  do  best  by  cooperating  with  one  another. 

The  payoff  matrix  for  the  prisoner  s  dilemma  game  is  shown  in  Table  1.  This 
payoff  matrix  has  the  following  interesting  property.  Assume,  as  is  often  assumed 
in  game  theory,  that  each  player  wants  to  maximize  his  immediate  payoff,  and  let's 
analyze  what  player  A  should  do.  If  B  cooperates,  then  A  should  defect,  because 
then  A  will  get  a  score  of  5  whereas  he  only  gets  a  score  of  3  if  he  cooperates.  On 
the  other  hand,  if  B  defects,  then  again,  A  should  defect,  as  he  will  get  a  score  of 
1  if  he  defects  while  he  only  gets  a  score  of  0  if  he  cooperates.  So,  no  matter  what 
B  does,  A  maximizes  his  immediate  payoff  by  defecting.  Since  the  payoff  matrix  is 
symmetric,  the  same  reasoning  applies  to  player  B,  so  B  should  defect  no  matter 
what  A  does.  Under  this  reasoning,  each  player  will  defect  at  each  time  step,  giving 
them  1  point  each  per  play.  However,  if  they  could  somehow  decide  to  cooperate, 
they  would  each  get  3  points  per  play:  the  two  players  will  do  better  in  the  long 
run  by  foregoing  the  action  that  maximizes  their  immediate  payoff. 

The  question  is,  of  course,  can  ordinary  Darwinian  mechanisms,  which  assume 
that  individuals  selfishly  want  to  maximize  their  immediate  payoff,  lead  to  coop¬ 
eration?  Surprisingly,  as  demonstrated  by  Axelrod  and  Hamilton,  the  answer  is 
yes. 
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Strategy  name 

[00]  All  Defect 

[01]  TIT-for-TAT  (TFT) 

[10]  TAT-for-TlT  (anti-TFT) 

[11]  All  Cooperate 

FIGURE  14  Four  possible  memory  1  strategies. 


[n  Lindgren’s  version  of  this  game,  strategies  can  evolve  in  an  open-ended  fash¬ 
ion  by  learning  to  base  their  decisions  on  whether  to  cooperate  or  defect  upon  longer 
and  longer  histories  of  previous  interactions. 

The  scheme  used  by  Lindgren  to  represent  strategies  to  play  the  Iterated  Pris¬ 
oner’s  Dilemma  game  is  as  follows.  In  the  simplest  version  of  the  game,  players  make 
their  choice  of  whether  to  cooperate  or  defect  based  solely  on  what  their  opponent 
did  to  them  in  the  last  time  step.  This  is  called  the  memory  1  game.  Since  the 
opponent  could  have  done  only  one  of  two  things,  cooperate  or  defect,  a  strategy 
needs  to  specify  what  it  would  do  in  either  of  those  two  cases.  As  it  has  two  moves 
it  can  make  in  either  of  those  two  cases,  cooperate  or  defect,  there  are  four  possible 
memory  1  strategies.  These  can  be  encoded  in  bit  strings  of  length  2.  as  illustrated 
in  Figure  14. 

If  the  players  should  base  their  decisions  by  looking  another  move  into  the 
past,  to  see  what  they  did  to  their  opponent  before  their  opponent  made  his  move, 
then  we  would  have  the  memory  2  geune.  In  this  case,  there  are  two  moves  with  two 
possible  outcomes  each,  meaning  thafa  memory  2  strategy  must  specify  whether  to 
cooperate  or  defect  for  each  of  four  possible  cases.  Such  a  strategy  can  be  encoded 
using  four  bits,  twice  the  length  of  the  memory  1  strategies,  so  there  will  be  16 
possible  memory  2  strategies.  Memory  3  strategies  require  another  doubling  of  the 
encoding  bit  string,  i.e.,  8  bits,  yielding  256  possible  strategies.  In  general,  memory 
n  strategies  require  2”  bits  for  their  encoding,  and  there  will  be  2^^  ^  such  strategies. 

In  order  to  allow  for  the  evolution  of  higher  memory  strategies,  Lindgren  in¬ 
troduces  a  new  genetic  operator:  gene  duplication.  As  a  memory  n  strategy  is  just 
twice  as  long  as  a  memory  n  —  1  strategy,  a  memory  n  strategy  can  be  produced 
from  a  memory  n  —  1  strategy  by  simply  duplicating  the  memory  n  strategy  and 
concatenating  the  duplicate  to  itself.  In  Lindgren’s  encoding  strategy,  gene  dupli¬ 
cation  has  the  interesting  property  that  it  is  a  neutral  mutation.  Simple  duplication 
alone  does  not  change  the  PTYPE,  even  though  it  has  doubled  the  length  of  the 


232 


Christopher  G.  Langton 


GTYPE.  However,  once  doubled,  mutations  in  the  longer  GTYPE  will  alter  the 
behavior  of  the  PTYPE. 

Once  again,  evolution  proceeds  by  allowing  populations  of  different  organisms 
to  bootstrap  each  other  up  coupled  fitness  landscapes,  dynaimically  deforming  each 
other’s  landscapes  by  turning  local  maxima  into  local  minima.  Again,  the  fitness 
of  strategies  is  not  an  absolute  fixed  number  that  is  independently  computable. 
Rather,  the  fitness  of  each  strategy  depends  on  what  other  strategies  exist  in  the 
“natural”  population. 

Many  complicated  and  interesting  strategies  evolve  during  the  evolutionary 
development  of  this  system.  More  important,  however,  are  the  various  phen^  .  no- 
logical  features  exhibited  by  the  dynamics  of  the  evolutionary  process.  First  of  all, 
as  we  might  expect,  the  system  exhibits  a  behavior  that  is  remarkably  suggestive  of 
Punctuated  Equilibria.  After  an  initial  irregular  transient,  the  system  settles  down 
to  relatively  long  periods  of  stasis  “punctuated”  irregularly  by  period  rapid 
evolutionary  change  (Figure  15). 

Second,  the  diversity  of  strategies  builds  up  during  the  long  periods  of  stasis, 
but  often  collapses  drastically  during  the  short,  chaotic  episodes  of  rapid  evolution¬ 
ary  succession  (Figure  16).  These  “crashes”  in  the  diversity  of  species  constitute 
“extinction  events.”  In  this  model,  these  extinction  events  are  observed  to  be  a 
natural  consequence  of  Jthe  dynamics  of  the  evolutionary  process  alone,  without 
invoking  any  catastrophic,  external  perturbations  (there  are  no  comet  impacts  or 
“nemesis”  stars  in  this  model!).  Furthermore,  these  extinction  events  happen  on 
multiple  scales;  there  are  lots  of  little  ones  and  fewer  large  ones. 


FIGURE  15  During  evolutionary  devebpment,  the  system  settles  down  to  relatively 
long  periods  of  stasis  “punctuated”  irregularly  by  perbds  of  rapid  evolutionary  change. 
Original  figure  appeared  in  “Evolutionary  Phenomena  in  Simple  Dynamics  ”  ^  K. 
Lindgren,  in  Artificial  Life  II,  edited  by  C.  G.  Langton,  C.  Taylor,  J.  D.  Farmer,  and 
S.  Rasmussen,  (Redwood  City,  CA:  Addison-Wesley,  1991)^^ 
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FIGURE  16  (cont'd.)  The  evolutionary  dynamics  of  strategies  in  the  iterated 
prisoner’s  dilemma  system  nf  I  indnren  In  both  cases,  the  top  trace  plots  the  changing 
concentration  of  strategies  in  the  population  while  the  bottom  trace  shows  two  things: 
the  solid  line  plots  the  average  fitness  of  the  population,  while  the  dotted  tine  plots 
the  diversity  of  species  (the  number  of  different  strategies  in  the  population  at  any 
time.)  in  all  cases,  time  is  traced  on  the  horizontal  axis.  The  top  traces  illustrate  the 
interplay  between  metastable  and  chaotic  episodes,  while  the  bottom  traces  illustrate 
the  "extinction  events”  that  are  often  associated  with  the  end  of  metastabie  periods. 
These  extinction  events  can  be  quite  large,  as  is  seen  in  the  bottom  trace. 


This  is  important  because  in  order  to  understand  the  dynamics  of  a  system 
that  is  subjected  to  constant  perturbations,  one  needs  to  understand  the  dynamics 
of  the  unperturbed  system  first.  We  do  not  have  access  to  an  unperturbed  version 
of  the  evolution  of  life  on  Earth;  consequently,  we  could  not  have  said  definitively 
that  extinction  events  on  many  size  scales  would  be  a  natural  consequence  of  the 
process  of  evolution  itself.  By  comparing  the  perturbed  and  unperturbed  versions 
of  model  systems  like  Lindgren’s,  we  may  very  well  be  able  to  derive  a  universal 
scaling  relationship  for  “natural”  extinction  events,  and  therefore  be  able  to  explain 
deviations  from  this  relationship  in  the  fossil  record  as  due  to  external  perturbations 
such  as  the  impact  of  large  asteroids. 

Third,  the  emergence  of  ecologies  is  nicely  demonstrated  by  Lindgren’s  model. 
It  is  usually  the  case  that  a  mix  of  several  different  strategies  dominates  the  system 
during  the  long  periods  of  stasis.  In  order  for  a  strategy  to  do  well,  it  must  do  well  by 
cooperating  with  other  strategies.  These  mixes  may  involve  three  or  more  strategies 
whose  collective  activity  produces  a  stable  interaction  pattern  that  benefits  all  of 
the  strategies  in  the  mix.  Together,  they  constitute  a  more  complex,  “higher  order” 
strategy,  which  can  behave  as  a  group  in  ways  that  are  impossible  for  any  individual 
strategy. 

It  is  important  to  note  that,  in  many  cases,  the  “environment”  that  acts  on  an 
organism,  and  in  the  context  of  which  an  organism  acts,  is  primarily  constituted 
of  the  other  organisms  in  the  population  and  their  interactions  with  each  other 
and  the  physical  environment.  There  is  tremendous  opportunity  here  for  evolution 
to  discover  that  certain  sets  of  individuals  exhibit  emergent,  collective  behaviors 
that  reap  benefits  to  all  of  the  individuals  in  the  set.  Thus,  evolution  can  produce 
major  leaps  in  biological  complexity,  without  having  to  produce  more  complex 
individuals  by  simply  discovering,  perhaps  even  “tripping  over,”  the  many  ways  in 
which  collections  of  individuals  at  one  level  can  work  together  to  form  aggregate 
individuals  at  the  next  higher  level  of  organization.'* 

This  is  thought  to  be  the  case  for  the  origin  of  eukaryotic  cells,  which  are  viewed 
as  descended  from  early  cooperative  collections  of  simpler,  prokaryotic  cells.*®  It 
is  also  the  process  involved  in  the  origin  of  multicellular  organisms,  which  lead  to 
the  Cambrian  explosion  of  diversity  some  700  million  years  ago.  It  was  probably 
a  significant  factor  in  the  origin  of  the  prokaryotes  themselves,  and  it  has  been 
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discovered  independently  at  least  seven  times  by  the  various  social  insects  (including 
species  of  wasps,  bees,  ants,  and  termites). 

The  final  step  in  eliminating  our  hand  from  the  selection/breeding  process  and 
setting  the  stage  for  true  “natural”  selection  within  a  computer  is  taJcen  in  a  model 
due  to  Tom  Ray.*^  This  step  involves  eliminating  our  algorithmic  breeding  agent 
completely. 

In  his  “Tierra”  simulation  system,  computer  programs  compete  for  CPU  time 
and  memory  space.  The  “task”  that  these  programs  must  perform  in  order  to  be 
reproduced  is  simply  the  act  of  self-reproduction  itself!  Thus,  there  is  no  need  for 
an  externally  defined  fitness  function  that  determines  which  GTYPES  get  copied 
by  an  external  copying  procedure.  The  programs  reproduce  themselves,  and  the 
ones  that  are  better  at  this  task  take  over  the  population.  The  whole  external  task 
of  evaluation  of  fitness  has  been  internalized  in  the  function  of  the  organisms  them¬ 
selves.  Thus,  there  is  no  longer  a  place  for  the  human  breeder  or  his  computational 
agent.  This  results  in  genuine  natural  selection  within  a  computer. 

In  Tierra,  programs  replicate  themselves  “noisily,”  so  that  some  of  their  off¬ 
spring  behave  differently.  Variant  programs  that  reproduce  themselves  more  effi¬ 
ciently,  which  trick  other  programs  into  reproducing  them,  or  which  capture  the 
execution  pointers  of  other  programs,  etc.,  will  leave  more  offspring  than  others. 
Similarly,  programs  that  learn  to  defend  themselves  against  such  tricks  will  leave 
more  offspring  than  those  that  do  not. 

We  will  discuss  a  few  of  the  “digital  organisms”  that  have  emerged  within  the 
Tierra  system  (it  is  not  necessary  to  understand  the  code  in  the  illustratev..  programs 
in  order  to  follow  the  explanation  in  the  text.)I®l 

Figure  17(a)  shows  the  self-replicating  “ancestor”  program  that  is  the  only 
program  Tom  Ray  has  ever  written  in  the  Tierra  system.  All  the  other  programs 
evolved  under  the  action  of  natural  selection. 

The  ancestor  program  works  as  follows.  In  the  top  block  of  code,  the  program 
locates  its  “head”  and  its  “tail,”  templates  marking  the  upper  and  lower  boundaries 
of  the  program  in  memory.  It  saves  these  locations  in  special  registers  and,  after 
subtracting  the  location  of  the  head  from  the  location  of  the  tail,  it  stores  its  length 
in  another  register. 

In  the  second  block  of  code,  the  program  enters  an  endless  loop  in  which  it  will 
repeatedly  produce  copies  of  itself.  It  allocates  memory  space  of  the  appropriate 
size  and  then  invokes  the  final  block  of  code,  which  is  the  actual  reproduction  loop. 
After  it  returns  from  the  reproduction  loop,  it  creates  a  new  execution  pointer  to 
its  newly  produced  offspring,  and  cycles  back  to  create  another  offspring. 

In  the  third  and  final  block  of  code,  the  reproduction  loop,  the  program  copies 
itself,  instruction  by  instruction,  into  the  newly  allocated  memory  space,  making 
use  of  the  addresses  and  length  stored  away  by  the  first  block  of  code.  When  it  has 
copied  itself  completely,  it  returns  to  the  block  of  code  that  called  it,  in  this  case, 
the  second  block. 


l®lThe  details  are  to  be  found  in  Ray.®* 
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It  should  be  noted  that  function  calls”  in  Tierra  are  accomplished  by  seek¬ 
ing  for  a  specific  bit  pattern  in  memory  rather  than  by  branching  to  a  specific 
address.  Thus,  when  the  second  block  of  code  “calls”  the  third  block  of  code,  the 
reproduction  loop,  it  does  so  by  initiating  a  seek  forward  in  memory  for  a  spe¬ 
cific  “template.”  When  this  template  is  found,  execution  begins  at  the  instruction 
following  the  template.  Returns  from  function  calls  are  handled  in  the  normal  man¬ 
ner,  by  simply  returning  to  the  instruction  following  the  initial  functic  ^  call.  This 
template  addressing  scheme  is  used  in  other  reference  contexts  as  well,  and  helps 
make  Tierra  language  programs  robust  to  mutations,  as  well  as  easily  relocatable 
in  memory. 

Figure  17(b)  shows  a  “parasite”  program  that  has  evolved  to  exploit  the  an¬ 
cestor  program.  The  parasite  is  very  much  like  the  ancestor  program,  except  that 
it  is  missing  the  third  block  of  code,  the  reproduction  loop.  How  then  does  it  copy 
itself? 

The  answer  is  that  it  makes  use  of  a  nearby  ancestor  program’s  reproduction 
loop!  Recall  that  a  function  call  in  Tierra  initiates  a  seek  forward  .iiory  for 
a  particular  template  of  bits.  If  this  pattern  is  not  found  within  lae  initiating 
program’s  own  code,  the  search  may  proceed  forward  in  memory  into  the  code 
of  other  organisms,  where  the  template  may  be  found  and  where  execution  then 
begins.  When  the  invoked-function  in  another  organism’s  code  executes  the  “return” 
statement,  execution  reverts  to  the  program  that  initiated  the  function  call.  Thus, 
organisms  can  execute  each  other’s  code,  and  this  is  exactly  what  .iie  parasite 
program  does:  it  makes  use  of  the  reproductive  machinery  of  the  ancestor  host. 

This  means  that  the  parasite  does  not  have  to  take  the  time  to  copy  the  code 
constituting  the  reproductive  loop,  and  hence  can  reproduce  more  rapidly,  as  it 
has  fewer  instructions  to  copy.  The  parasites  thus  proliferate  in  the  population. 
However,  they  cannot  proliferate  to  the  point  of  driving  out  the  ancestor  hosts 
altogether,  for  they  depend  on  them  for  their  reproductive  machinery.  Thus,  a 
balance  is  eventually  struck  optimizing  the  joint  system. 

Eventually,  however,  another  mutant  form  of  the  ancestor  emerges  which  has 
developed  an  immunity  to  the  parasites.  This  program  is  illustrated  in  Figure  17(c). 
Two  key  differences  from  the  ancestor  program  confer  the  immunity  to  the  parasite 
programs.  First,  instead  of  executing  a  “return”  instruction,  the  reproduction  loop 
instead  initiates  a  jump  back  in  memory  to  the  template  found  in  the  instruction 
that  calls  the  reproduction  loop.  This  has  the  same  effect  as  a  return  statement 
when  executed  by  the  immune  program,  but  has  a  very  different  effect  on  the 
parasite.  The  second  important  difference  is  that  following  the  cell  division  in  the 
second  block  (which  allocates  a  new  execution  pointer  to  the  offspring  just  created), 
the  program  jumps  back  to  the  beginning  of  the  first  block  of  code,  rather  than  to 
the  beginning  of  the  second  block.  Thus,  the  immune  program  constantly  resets  its 
heeid,  tail,  and  size  registers.  This  seems  useless  when  considering  only  the  immune 
organism’s  own  reproduction,  but  let’s  see  what  happens  when  a  parasite  tries  to 
execute  the  reproduction  loop  in  an  immune  organism. 

When  a  parasite  attempts  to  use  the  immune  program’s  reproduction  code, 
the  new  jump  transfers  the  parasite’s  execution  pointer  to  the  second  block  of  the 
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immune  program’s  code,  rather  than  returning  it  to  the  second  block  of  the  parasite 
code,  as  the  parasite  expects.  Then,  this  execution  pointer  is  further  re-directed  to 
the  first  block  of  the  immune  program,  where  the  registers  originally  containing  the 
head,  tail,  and  length  of  the  parasite  are  reset  to  contain  the  head,  tail,  and  length 
of  the  immune  organism.  The  immune  program  has  thus  completely  captured  the 
execution  pointer  of  the  parasite.  Having  lost  its  execution  pointer,  the  parasite 
simply  becomes  dormant  data  occupying  memory,  while  the  immune  program  now 
has  two  execution  pointers  running  through  it:  its  own  original  pointer,  plus  the 
pointer  it  captured  from  the  parasite.  Thus,  the  immune  progrtun  now  reproduces 
twice  as  rapidly  as  before.  Once  they  emerge,  such  immune  programs  rapidly  drive 
the  parasites  to  extinction. 

Complex  interactions  between  variant  programs  like  those  described  above  con¬ 
tinue  to  develop  within  evolutionary  runs  in  Tierra.  From  a  uniform  population  of 
self-reproducing  ancestor  programs,  Ray,  a  tropical  biologist  by  training,  notes  the 
emergence  of  whole  “ecologies”  of  interacting  species  of  computer  programs.  Fur¬ 
thermore,  he  is  able  to  identify  many  phenomena  familiar  to  him  from  his  studies 
of  real  ecological  communities,  such  as  competitive  exclusion,  the  emergence  of  par¬ 
asites,  key-stone  predators  and  parasites,  hyper-parasites,  symbiotic  relationships, 
sociality,  “cheaters,”  and  so  forth. 
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FIGURE  17  Digital  organisms  from  Ray’s^^  tierfa  simulation  system,  (a)  Self- 
reproducing  ancestor,  (b)  An  early  parasite  of  the  ancestor,  (c)  A  decendant  of  the 
ancestor  that  is  immune  to  the  parasite. 
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Again,  the  actual  “fitness”  of  aji  organism  is  a  complex  function  of  its  inter¬ 
actions  wiih  other  organisms  in  the  “soup.”  Collections  of  programs  can  cooperate 
to  enhance  each  other's  reproductive  success,  or  they  can  drive  each  other's  repro¬ 
ductive  success  down,  thus  lowering  fitness  and  kicking  the  population  off  of  local 
fitness  peaks. 

Not  surprisingly,  Ray,  too,  has  noted  periods  of  relative  stasis  punctuated  by- 
periods  of  rapid  evolutionary  change,  as  complex  ecological  webs  collapse  and  new 
ones  stabilize  in  their  place.  Systems  like  Ray’s  Tierra  capture  roper  context 
for  evolutionary  dynamics,  and  natural  selection  is  truly  at  pla_>  ere. 


8.  CONCLUSION 

This  article  is  intended  to  provide  a  broad  overview  of  the  field  of  Artificial  Life,  its 
motivations,  history,  theory,  and  practice.  In  such  a  short  space,  it  cannot  hope  to 
go  into  depth  in  any  one  of  these  areas.  Rather,  it  attempts  to  convey  the  “spirit” 
of  the  Artificial  Life  enterprise  via  several  illustrative  examples  coupled  with  a  good 
deal  of  motivating  explanation  and  discussion. 

The  field  of  Artificial  Life  is  in  its  infancy,  and  is  currently  engaged  in  a  period 
of  extremely  rapid  growth,  which  is  producing  many  new  converts  to  the  principles 
detail'  '  re.  However,  it  is  also  raising  a  significant  amount  of  con:  roversy,  and  is 
no-  vut  its  critics.  The  notion  of  studying  biology  via  the  study  of  patently  non¬ 
bio  i^ical  things  is  an  idea  that  is  hard  for  the  traditional  biological  copmmunity  to 
accept.  The  acceptance  of  Artificial  Life  techniques  within  the  biological  community- 
will  be  directly  proportional  to  the  contributions  it  makes  to  our  understanding  of 
biological  phenomena. 

That  these  contributions  are  forthcoming,  I  have  no  doubt.  However,  high- 
quality  research  in  Artificial  Life  is  difficult,  because  it  requires  that  its  practitioners 
be  experts  in  both  the  computational  sciences  and  the  biological  sciences.  Either  of 
these  alone  is  a  full-time  career,  and  so  the  danger  lurks  of  doing  either  masterful 
biology  but  trivial  computing,  or  dping  masterful  computing  but  trivial  biology. 

Therefore,  I  strongly  suggest  incorporating  a  trick  from  nature;  cooperate!  As 
is  amply  illustrated  in  many  of  the  examples  discussed  in  this  article,  nature  of¬ 
ten  discovers  that  collections  of  individuals  easily  solve  problems  that  would  be 
extremely  difficult  or  even  impossible  for  individuals  to  solve  on  their  own.  Collab¬ 
orations  between  biologists  and  computer  scientists  are  quite  likely  to  be  the  most 
appropriate  vehicles  for  making  significant  contributions  to  our  understanding  of 
biology  via  the  pursuit  of  Artificial  Life. 

So,  if  you  are  a  computer  expert  dying  to  hack  together  an  evolution  program, 
go  find  yourself  a  top-notch  evolutionary  biologist  to  collaborate  with,  one  who  will 
bring  to  the  enterprise  an  in-depth  understanding  of  the  subtleties  of  the  evolution¬ 
ary  process  plus  a  proper  set  of  open  questions  about  evolution  towards  which  your 
evolution  program  might  be  addressed. 
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On  the  other  hand,  if  you  are  a  field  biologist  interested  in  doing  some  numer¬ 
ical  simulations  in  order  to  understand  the  ecological  dynamics  you  axe  observing 
in  the  field,  hook  up  with  a  top-notch  paiallel-computing  expert,  who  will  bring 
to  the  enterprise  a  thorough  knowledge  of  the  subtleties  involved  in  multi-agent 
interactions,  and  will  be  in  possession  of  an  equally  open  set  of  questions,  which 
you  very  well  might  find  to  be  strikingly  related  to  your  own. 

Above  all,  when  in  doubt,  turn  to  Mother  Nature.  After  all,  she  is  smarter  than 

you! 
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Order  Parameters,  Broken  Symmetry,  and 
Topology _ 


As  a  kid  in  elementary  school,  I  was  taught  that  there  were  three  states  of 
matter:  solid,  liquid,  and  gets.  The  ancients  thought  that  there  were  four — 
earth,  water,  air,  and  fire — which  is  considered  sheer  superstition.  In  junior 
high,  I  remember  reading  a  book  called  The  Seven  States  of  Matter.  At  least 
one  was  “plasma,”  which  made  up  stars  and  thus  most  of  the  universe, 
and  which  sounded  rather  like  fire  to  me. 

The  original  three,  by  now,  have  become  multitudes.  In  important  and 
precise  ways,  magnets  are  a  distinct  form  of  matter.  Metals  are  different 
from  insulators.  Superconductors  and  superfluids  are  striking  new  states  of 
matter.  The  liquid  crystal  in  your  wristwatch  is  one  of  a  huge  family  of 


different  liquid  crystalline  states  of  matter^  (nematic,  cholesteric,  blue 
phcise  I,  II,  and  blue  fog,  smectic  A,  B,  C,  C*,  D,  I,. . .).  There  are  over  200 
quditatively  different  types  of  crystals,  not  to  mention  the  quasi-crystals 
(Figure  1).  There  are  disordered  states  of  matter,  like  spin  glasses,  and 
states  like  the  fractional  queuitum-hall  effect  with  excitations  of  charge  e/3 

^^^They  hadn’t  heard  of  dark  matter  b^uJc  then. 
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(like  quarks).  Particle  physicists  tell  us  that  the  vacuum  we  live  within  has, 
in  the  past,  been  in  quite  difTerent  states;  in  the  last  vacuum  but  one,  there 
were  four  different  kinds  of  light^  (mediated  by  what  is  now  the  photon, 
the  W"*",  the  W~,  and  the  Z  particle).  We’ll  discuss  this  more  in  the  next 
paper 

When  there  were  only  three  states  of  matter,  we  could  learn  about  each 
one  amd  then  turn  back  to  learning  long  division.  Now  that  there  are  mul¬ 
titudes,  though,  we’ve  had  to  develop  a  system.  Our  system  is  constantly 
being  extended  and  modified,  because  we  keep  finding  new  phases  which 
don’t  fit  into  the  old  frameworks.  It’s  amazing  how  the  500th  new  state  of 
matter  somehow  screws  up  a  system  which  worked  fine  for  the  first  499. 
Quasi-crystals,  the  fractional  quantum-hall  effect,  and  spin  glasses  all  re¬ 
ally  stretched  our  minds  until  (1)  we  understood  why  they  behaved  the  way 
they  did,  and  (2)  we  understood  how  they  fit  into  the  general  framework. 

In  this  paper.  I’m  going  to  tell  you  the  system.  In  the  subsequent  sections. 
I’ll  discuss  some  gaps  in  the  system:  materials  and  types  of  behavior  which 
don’t  fit  into  the  neat  framework  presented  here.  I’ll  try  to  maximize  the 
number  of  pictures  and  minimize  the  number  of  formulas,  but  there  are 
problems  and  ideas  that  I  don’t  understand  well  enough  to  explain  simply. 
Most  of  what  I  tell  you  in  this  paper  is  both  true  and  important.  Much 
of  what  is  contained  in  the  following  sections  represents  my  own  pet  ideas 
and  theories,  and  you  should  be  warned  not  to  take  my  messages  there  as 
gospel. 

The  system  consists  of  four  basic  steps.^  First,  you  must  identify  the  broken 
symmetry.  Second,  you  must  define  an  order  parameter.  Third,  you  are  told 
to  examine  the  elementary  excitations.  Fourth,  you  classify  the  topological 
defects.  Most  of  what  I  say  I  take  from  Mermin,^  Coleman,^  and  deGennes,^ 
and  I  heartily  recommend  these  excellent  articles  to  my  audience.  We  take 
each  step  in  turn. 


I.  IDENTIFY  THE  BROKEN  SYMMETRY 

What  is  it  which  distinguishes  the  hundreds  of  different  states  of  matter?  Why  do 
we  say  that  water  and  olive  oil  are  in  the  same  state  (the  liquid  phase),  while  we  say 
aluminum  and  (magnetized)  iron  are  in  different  states?  Through  long  experience, 
we’ve  discovered  that  most  phases  differ  in  their  symmetry. 

1^1  This  is  not  to  say  that  difTerent  phases  always  differ  by  symmetries!  Liquids  and  gases  have 
the  same  symmetry.  In  fact,  one  can  go  continuously  from  a  liquid  to  a  gas,  by  going  first  to 
high  pressures  and  then  heating.  It  is  safe  to  say,  though,  that  if  the  two  materials  have  difTerent 
symmetries,  they  are  difTerent  phases. 
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FIGURE  1  Quasi-crystals.  Much  of  these  two  chapters  will  discuss  the  properties 
of  crystals.  Crystals  are  surely  the  oldest  known  of  the  broken-symmetry  phases  of 
matter,  and  remain  the  most  beautiful  illustratiohs.  It’s  amazing  that  in  the  past  few 
years,  we’ve  uncovered  an  entirely  new  class  of  crystals.  Shown  here  is  a  photograph 
of  a  quasi-crystalline  metallic  alloy,  with  icosahedral  symmetry.  Notice  the  five- 
pointed  stars:  our  old  notions  of  crystals  had  to  be  completely  revised  to  include  this 
type  of  symmetry.  Photograph  courtesy  of  Marc  Audier,  Ecole  Nationals  Superieure 
d’Electrochimie  et  d’Electrametailargie  de  Grenoble. 
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A  B 

FIGURE  2  Which  is  more  symmetric?  The  cube  has  many  symmetries.  It  can  be 
rotated  by  90°,  180®,  or  270°  about  any  of  the  three  axes  passing  through  the  faces. 

It  can  be  rotated  by  120®  or  240®  about  the  corners  and  180®  about  an  axis  passing 
from  the  center  through  any  of  the  12  edges.  The  sphere,  though,  can  be  rotated  by 
any  angle.  The  sphere  respects  rotational  invariance:  all  directions  are  equal.  The  cube 
is  an  object  which  breaks  rotational  symmetry:  once  the  cube  is  there,  some  directions 
are  more  equal  than  others. 


Consider  Figure  2,  showing  a  cube  and  a  sphere.  Which  is  more  synunetric? 
Clearly,  the  sphere  has  many  more  symmetries  than  the  cube.  One  can  rotate  the 
cube  by  90°  in  various  directions  and  not  change  its  appearance,  but  one  can  rotate 
the  sphere  by  any  angle  and  keep  it  unchanged. 

In  Figure  3,  we  see  a  two-dimensional  schematic  representation  of  ice  and  water. 
Which  state  is  more  symmetric  here?  Naively,  the  ice  looks  much  more  symmetric; 
regular  arrangements  of  atoms  forming  a  lattice  structure.  The  water  looks  irregu¬ 
lar  and  disorganized.  On  the  other  hand,  if  one  rotated  Figure  3(b)  by  an  arbitrary 
angle,  it  would  still  look  like  water!  Ice  has  broken  rotational  symmetry:  one  can 
rotate  Figure  3(a)  only  by  multiples  of  60®.  It  also  has  a  broken  translational  sym¬ 
metry:  it’s  easy  to  tell  if  the  picture  is  shifted  sideways,  unless  one  shifts  by  a  whole 
number  of  lattice  units.  While  the  snapshot  of  the  water  shown  in  the  figure  has  no 
symmetries,  water  as  a  phase  has  complete  rotational  and  translational  symmetry. 

One  of  the  standard  tricks  to  see  if  two  materials  differ  by  a  symmetry  is  to 
try  to  change  one  into  the  other  smoothly.  Oil  and  water  won’t  mix,  but  I  think  oil 
and  alcohol  do,  and  alcohol  and  water  certainly  do.  By  slowly  adding  more  alcohol 
to  oil,  amd  then  more  water  to  the  alcohol,  one  can  smoothly  interpolate  between 
the  two  phases.  If  they  had  different  symmetries,  there  must  be  a  first  point  when 
mixing  them  when  the  synunetry  changes,  and  it  is  usually  easy  to  tell  when  that 
phase  transition  happens. 
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II.  DEFINE  THE  ORDER  PARAMETER 

Particle  physics  and  condensed-matter  physics  have  quite  different  philosophies. 
Particle  physicists  are  constantly  looking  for  the  building  blocks.  Once  pions  and 
protons  were  discovered  to  be  made  of  quarks,  they  became  demoted  into  engi¬ 
neering  problems.  .Now  that  quarks  and  electrons  and  photons  are  made  of  strings, 
and  strings  are  hard  to  study  (at  least  e.xperimentally),  there  is  great  anguish  in 
the  high-energy  community,  (’ondensed-matter  physicists,  on  the  other  hand,  try 
to  understand  why  messy  combinations  of  zillions  of  electrons  and  nuclei  do  such 
interesting  simple  things.  To  them,  the  fundamental  question  is  not  discovering  the 
underlying  quantum  mechanical  laws,  but  in  understanding  and  explaining  the  new 
laws  that  emerge  when  many  particles  interact. 


FIGURE  3  Which  is  more  symmetric?  At  first  glance,  water  seems  to  have  much  less 
symmetry  than  ice.  The  picture  of  “two-dimensional”  ice  clearly  breaks  the  rotational 
invariance:  it  can  be  rotated  only  by  120°  or  240°.  It  also  breaks  the  translational 
invariance:  the  crystal  can  only  be  shifted  by  certain  special  distances  (whole  number 
of  lattice  units).  The  picture  of  water  has  no  symmetry  at  all:  the  atoms  are  jumbled 
together  with  no  long-range  pattern  at  all.  Water,  though,  isn’t  a  snapshot:  it  would  be 
better  to  think  of  it  as  a  combination  of  all  possible  snapshots!  Water  has  a  complete 
rotational  and  translational  symmetry:  the  pictures  will  look  the  same  if  the  container  is 
tipped  or  shoved. 
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FIGURE  4  Magnet.  We  take  the  magnetization  M  as  the  order  parameter  for  a 
magnet.  For  a  given  material  at  a  given  temperature,  the  amount  of  magnetization 
\M\  =  Mo  will  be  pretty  well  fixed,  but  the  energy  is  often  pretty  much  independent 
of  the  direction  M  =  M/Mo  of  the  magnetization.  (You  can  think  of  this  as  a  arrow 
pointing  to  the  north  end  of  each  atomic  magnet.)  Often,  the  magnetization  changes 
directions  smoothly  in  different  parts  of  the  material.  (That’s  why  not  all  pieces  of  iron 
are  magnetic!)  We  describe  the  current  state  of  the  material  by  an  order  parameter 
field  M(x).  The  order  parameter  field  is  usually  thought  of  as  an  arrow  at  each  point 
in  space,  it  can  also  be  thought  of  as  a  function  taking  points  in  space  x  into  points  on 
the  sphere  \M\  =  Mq.  This  sphere  is  the  order  parameter  space  for  the  magnet. 


As  one  might  guess,  we  don’t  keep  track  of  all  the  electrons  and  protons. We’re 
always  looking  for  the  important  variables,  the  important  degrees  of  freedom.  In 
a  crystal,  the  important  variables  are  the  motions  of  the  atoms  away  from  their 
lattice  positions.  In  a  magnet,  the  important  variable  is  the  local  direction  of  the 
magnetization  (an  arrow  pointing  to  the  “north”  end  of  the  local  magnet).  The  local 
magnetization  comes  from  complicated  interactions  between  the  electrons,  and  is 
partly  due  to  the  little  magnets  attached  to  each  electron  and  partly  due  to  the 
way  the  electrons  dance  around  in  the  material;  these  details  are  for  many  purposes 
unimportant. 

l^lThe  particle  physicists  use  order  parameter  fields,  too.  Their  order  parameter  fields  also  hide 
lots  of  details  about  what  their  quarks  and  gluons  are  composed  of.  The  main  difference  is  that 
they  don’t  know  of  what  their  fields  are  composed.  It  ought  to  be  reassuring  to  them  that  we 
don’t  always  find  our  greater  knowledge  very  helpful. 
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The  important  variables  are  combined  into  an  “order  parameter  field. In  Fig¬ 
ure  4,  we  see  the  order  parameter  field  for  a  magnet. At  each  position  x  =  (x,  y,  z), 
we  have  a  direction  for  the  local  magnetization  M(x).  The  length  of  M  is  pretty 
much  fixed  by  the  material,  but  the  direction  of  the  magnetization  is  undetermined. 
By  becoming  a  magnet,  this  material  hits  broken  the  rotational  symmetry.  The  order 
parameter  M  labels  which  of  the  various  broken  symmetry  directions  the  material 
has  chosen. 

The  order  parameter  is  a  field:  at  each  point  in  our  magnet,  M(x)  tells  the  local 
direction  of  the  field  near  x.  Why  do  we  do  this?  Why  would  the  magnetization 
point  in  different  directions  in  different  parts  of  the  magnet?  Usually,  the  material 
has  lowest  energy  when  the  order  parameter  field  is  uniform,  when  the  symmetry  is 
broken  in  the  same  way  throughout  space.  In  practice,  though,  the  material  often 
doesn’t  break  symmetry  uniformly.  Most  pieces  of  iron  don’t  appear  magnetic, 
simply  because  the  local  magnetization  points  in  different  directions  at  different 
places.  The  magnetization  is  already  there  at  the  atomic  level:  to  make  a  magnet, 
you  pound  the  different  domains  until  they  line  up.  We’ll  see  in  this  section  that 
most  of  the  interesting  behavior  we  can  study  involves  the  way  the  order  parameter 
varies  in  space. 

The  order  parameter  field  Af(x)  can  be  usefully  visualized  in  two  different 
ways.  On  the  one  hand,  one  can  think  of  a  little  vector  attached  to  each  point  in 
space.  On  the  other  hand,  we  can  think  of  it  as  a  mapping  from  real  space  into 
order  parameter  space.  That  is.  A?  is  a  function  which  takes  different  points  in  the 
magnet  onto  the  surface  of  a  sphere  (Figure  4).  Mathematicians  call  the  sphere  5^ 
because  it  locally  has  two  dimensions.  (They  don’t  care  what  dimension  the  sphere 
is  embedded  in.) 

Before  veirying  our  order  parameter  in  space,  let’s  develop  a  few  more  exam¬ 
ples.  The  liquid  crystal  in  LCD  displays  (like  those  in  digital  watches)  are  nemat¬ 
ics.  Nematics  are  maide  of  long,  thin  molecules  which  tend  to  line  up  so  that  their 


Choosing  an  order  parameter  is  an  art.  Usually  it’s  a  new  phase  which  we  don’t  understand  yet, 
and  guessing  the  order  parameter  is  a  piece  of  figuring  out  what’s  going  on.  Also,  there  is  often 
more  than  one  sensible  choice.  In  magnets,  Jbr  example,  one  can  treat  M  as  a  fixed-length  vector 
in  S^,  labelling  the  different  broken  symmetry  states.  This  is  the  best  choice  at  low  temperatures, 
where  we  study  the  elementary  excitations  and  topological  defects.  For  studying  the  transition 
from  low  to  high  temperatures,  when  the  magnetization  goes  to  zero,  it  is  better  to  consider  M 
as  a  vector  of  varying  length  (a  vector  in  7Z^).  Finding  the  simplest  description  for  your  needs  is 
often  the  key  to  the  problem. 

Most  magnets  are  crystals,  which  already  have  broken  the  rotational  symmetry.  For  some 
“Heisenberg”  magnets,  the  effects  of  the  crystal  on  the  magnetism  is  small.  Magnets  are  re¬ 
ally  distinguished  by  the  fact  that  they  break  time-reversal  symmetry:  if  you  reverse  the  arrow  of 
time,  the  magnetization  would  change  direction! 


250 


Order  Parameters,  Broken  Symmetry,  and  Topology 


FIGURE  5  Nematic  liquid  crystal.  Nematic  liquid 
crystals  are  made  up  of  long,  thin  molecules  that 
prefer  to  align  with  one  another.  (Liquid  crystal 
watches  are  made  of  nematics.)  Since  they  don’t 
care  much  which  end  is  up,  their  order  parameter 
isn’t  precisely  the  vector  h  abng  the  axis  of  the 
molecules.  Rather,  it  is  a  unit  vector  up  to  the 
equivalence  n  =  —  h.  The  order  parameter 
space  is  a  half-sphere,  with  antipodal  points  on 
the  equator  identified.  Thus,  for  example,  the 
path  shown  over  the  top  of  the  hemisphere  is 
a  closed  loop:  the  two  intersections  with  the 
equator  correspond  to  the  same  orientations  of 
the  nematic  molecules  in  space. 


FIGURE  6  Two-dimensional  crystal.  A*  crystal  consists  of  atoms  arranged  in  regular, 
repeating  rows  and  columns.  At  high  temperatures,  or  when  the  crystal  is  deformed  or 
defective,  the  atoms  will  be  displaced  from  their  lattice  positions.  The  displacements  u 
are  shown.  Even  better,  one  can  think  of  u(x)  as  the  local  translation  needed  to  bring 
the  ideal  lattice  into  registry  with  atoms  in  the  local  neighborhood  of  x.  Also  shown  is 
the  ambiguity  in  the  definition  of  u.  Which  "ideal”  atom  should  we  identify  with  a  given 
"real”  one?  This  ambiguity  mak^  the  order  parameter  u  equivalent  to  u  -l-  max  -f  nay. 
Instead  of  a  vector  in  two-dimensional  space,  the  order  parameter  space  is  a  square 
with  periodic  boundary  conditions. 
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FIGURE  7  Order  parameter  space  for  a  two-dimensional  crystal.  Here  we  see  that  a 
square  with  periodic  boundary  conditions  is  a  torus.  (A  torus  is  a  surface  of  a  doughnut, 
inner  tube,  or  bagel,  depending  on  your  background.) 


long  axes  are  parallel.  Nematic  liquid  crystals,  like  magnets,  break  the  rotational 
symmetry.  Unlike  magnets,  though,  the  main  interaction  isn’t  to  line  up  the  north 
poles,  but  to  line  up  the  axes.  (Think  of  the  molecules  as  American  footballs:  the 
same  up  and  down.)  Thus  the  order  parameter  isn’t  a  vector  M  but  a  headless  vector 
n  =  — n.  The  order  parameter  space  is  a  hemisphere,  with  opposing  points  along 
the  equator  identified  (Figure  5).  This  space  is  called  by  the  mathematicians 
(the  projective  plane),  for  obscure  reasons. 

For  a  crystal,  the  important  degrees  of  freedom  are  associated  with  the  broken 
translational  order.  Consider  a  two-dimensional  crystal  which  has  lowest  energy 
when  in  a  square  lattice,  but  which  is  deformed  away  from  that  configuration  (Fig¬ 
ure  6).  This  deformation  is  described  by  an  arrow  connecting  the  undeformed  ideal 
lattice  points  with  the  actual  positions  of  the  atoms.  If  we  are  a  bit  more  careful, 
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we  say  that  u(x)  is  that  displacement  needed  to  align  the  ideal  lattice  in  the  local 
region  onto  the  real  one.  By  saying  it  this  way,  u  is  also  defined  between  the  lattice 
positions;  there  still  is  a  best  displacement  which  locally  lines  up  the  two  lattices. 

The  order  parameter  u  isn’t  really  a  vector;  there  is  a  subtlety.  In  general, 
which  ideal  atom  you  associate  with  a  given  real  one  is  ambiguous.  As  shown  in 
Figure  6,  the  displacement  vector  u  changes  by  a  multiple  of  the  lattice  constant  a 
when  we  choose  a  different  reference  atom: 

u  =  u  +  ax—  u-\-  max  +  nay  .  ( 1 ) 

The  set  of  distinct  order  parameters  forms  a  square  with  periodic  boundary  condi¬ 
tions.  As  Figure  7  shows,  a  square  with  periodic  boundary  conditions  has  the  same 
topology  as  a  torus,  T“.  (The  torus  is  the  surface  of  a  doughnut,  bagel,  or  inner 
tube.) 

Finally,  let’s  mention  that  guessing  the  order  parameter  (or  the  broken  symme¬ 
try)  isn’t  always  so  straightforward.  For  example,  it  took  many  years  before  anyone 
figured  out  that  the  order  parameter  for  superconductors  and  superfluid  Helium  4 
is  a  complex  number  y’.  The  order  parameter  field  t/^(x)  represents  the  “condensaK* 
wave  function,”  which  (extremely  loosely)  is  a  single  quantum  state  occupied  by  a 
large  fraction  of  the  Cooper  pairs  or  helium  atoms  in  the  material.  The  correspond¬ 
ing  broken  symmetry  is  closely  related  to  the  number  of  particles.  In  “symmetric,” 
normal  liquid  helium,  the  local  number  of  atoms  is  conserved;  in  superfluid  helium, 
the  local  number  of  atoms  becomes  indeterminate!  (This  is  because  many  of  the 
atoms  are  condensed  into  that  delocalized  wave  function.)  Anyhow,  the  magnitude 
of  the  complex  number  ip  is  a.  fixed  function  of  temperature,  so  the  order  parameter 
space  is  the  set  of  complex  numbers  of  magnitude  \il>\.  Thus  the  order  parameter 
space  for  superconductors  and  superfluids  is  a  circle 

Now  we  examine  small  deformations  away  from  a  uniform  order  parameter  field. 


FIGURE  8  One-dimensional  crystal:  phonons.  The  order  parameter  field  for  a  one¬ 
dimensional  crystal  is  the  local  displacement  u(i).  Long  wavelength  waves  in  u(a:) 
have  low  frequencies,  and  cause  sound.  Crystals  are  rigid  because  of  the  broken 
translational  symmetry.  Because  they  are  rigid,  they  fight  displacements.  Because 
there  is  an  underlying  translational  symmetry,  a  uniform  displacement  costs  no  energy. 
A  nearly  uniform  displacement,  thus,  will  cost  little  energy  and,  thus,  will  have  a  low 
frequency.  These  low-frequency  elementary  excitations  are  the  sound  waves  in  crystals. 
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ill  EXAMINE  THE  ELEMENTARY  EXCITATIONS 

Its  amazing  how  slow  human  beings  are.  The  atoms  inside  your  eyelash  collide  with 
one  another  billions  of  times  during  each  time  you  blink  your  eye.  It’s  not  surprising, 
then,  that  we  spend  most  of  our  time  in  condensed-matter  physics  studying  those 
things  in  materials  that  happen  slowly.  Typically  only  vast  conspiracies  of  immense 
numbers  of  atoms  can  produce  the  slow  behavior  that  humans  can  perceive. 

A  good  example  is  given  by  sound  waves.  We  won  t  talk  about  sound  waves  in 
air;  air  doesn't  have  any  broken  symmetries,  so  it  doesn't  belong  in  this  paper. 
Consider,  instead,  sound  in  the  one-dimensional  crystal  shown  in  Figure  8.  We 
describe  the  material  with  an  order  parameter  field  u(x),  where  here  x  is  the  position 
within  the  material  and  x  —  u(x)  is  the  position  of  the  reference  atom  within  the 
ideal  crystal. 

Now,  there  must  be  an  energy  cost  for  deforming  the  ideail  crystal.  There  won't 
be  any  cost,  though,  for  a  uniform  translation:  u(x)  =  uq  has  the  same  energy  as 
the  ideal  crystal.  (Shoving  all  the  atoms  to  the  right  doesn't  cost  any  energy.)  So, 
the  energy  will  depend  only  on  derivatives  of  the  function  u(x).  The  simplest  energy 
that  one  can  write  looks  like 


(2) 


(Higher  derivatives  won't  be  important  for  the  low  frequencies  that  humans  can 
hear.)  Now,  you  may  remember  Newton's  law  F  =  ma.  The  force  here  is  given  by 
the  derivative  of  the  energy  F  —  —(dS/du).  The  mass  is  represented  by  the  density 
of  the  material  p.  Working  out  the  math  (a  variational  derivative  and  an  integration 
by  parts,  for  those  who  are  interested)  gives  us  the  equation 


pii  = 


The  solutions  to  this  equation 


u(x,  <)  =  Uq  cos 


(3) 


(4) 


represent  phonons  or  sound  waves.  The  wavelength  of  the  sound  waves  is  A,  and 
the  frequency  is  u\.  Plugging  (4)  into  (3)  gives  us  the  relation 


t'A  = 


A 


(5) 


t®!We  argue  here  that  low  frequency  excitations  come  from  spontaneously  broken  symmetries. 
They  can  also  come  from  conserved  quantities:  since  Mr  cannot  be  created  or  destroyed,  a  long- 
wavelength  density  wave  cannot  relax  quickly. 
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FIGURE  9  (a)  Magnets:  spin  waves.  Magnets  break  the  rotational  invariance  of  space. 
Because  they  resist  twisting  the  magnetization  locally  but  don’t  resist  a  uniform  twist, 
they  have  low-energy  spin  wave  excitations,  (b)  Nematic  liquid  crystals:  rotational 
waves.  Nematic  liquid  crystals  also  haye  low-frequency  rotational  waves. 


The  frequency  gets  small  only  when  the  wavelength  gets  large.  This  is  the  vast 
conspiracy:  only  huge  sloshings  of  many  atoms  can  happen  slowly.  Why  does  the 
frequency  get  small?  Well,  there  is  no  cost  to  a  uniform  translation,  which  is  what 
(4)  looks  like  for  infinite  wavelength.  Why  is  there  no  energy  cost  for  a  uniform  dis¬ 
placement?  WeW,  there  is  a  translational  symmetry;  moving  all  the  atoms  the  same 
amount  doesn’t  change  their  interactions.  Bui  haven’t  we  broken  that  symmetry? 
That  is  precisely  the  point. 

Long  after  phonons  were  understood,  Jeremy  Goldstone  started  to  think  about 
broken  symmetries  and  order  parameters  in  the  abstract.  He  found  a  rather  general 
argument  that,  whenever  a  continuous  symmetry  (rotations,  translations,  5f/(3), 


James  P.  Sethna 


255 


. . .)  is  broken,  long  wavelength  modulations  in  the  symmetry  direction  should  have 
low  frequencies.  The  fact  that  the  lowest  energy  state  has  a  broken  symmetry 
means  that  the  system  is  stiff:  modulating  the  order  parameter  will  cost  an  energy 
rather  like  that  in  Eq.  (2).  In  crystals,  the  broken  translational  order  introduces  a 
rigidity  to  shear  deformations,  and  low  frequency  phonons  (Figure  8).  In  magnets, 
the  broken  rotational  symmetry  leads  to  a  magnetic  stiffness  and  spin  waves  (Fig¬ 
ure  9(a)).  In  nematic  liquid  crystals,  the  broken  rotational  symmetry  introduces  an 
orientational  elastic  stiffness  (it  pours,  but  resists  bending!)  and  rotational  waves 
(Figure  9(b)). 

In  superfluids,  the  broken  gauge  symmetry  leads  to  a  stiffness  which  results 
in  the  superfluidity.  Superfluidity  and  superconductivity  really  aren’t  einy  more 
amazing  than  the  rigidity  of  solids.  Isn’t  it  amaizing  that  chairs  are  rigid?  Push  on 
a  few  they  on  one  side,  and  10^  atoms  away  they  will  move  in  lock-step.  In  the 
same  way,  decreasing  the  flow  in  a  superfluid  must  involve  a  cooperative  change  in 
a  macroscopic  number  of  atoms,  and  thus  never  happens  spontaneously  any  more 
than  two  parts  of  the  chair  ever  drift  apart. 

The  low-frequency  Goldstone  modes  in  super  fluids  are  heat  waves!  (Don’t  be 
jealous:  liquid  helium  has  rather  cold  heat  waves.)  This  is  often  called  second  sound, 
but  it  is  really  a  periodic  modulation  of  the  temperature  which  passes  through  the 
material  like  sound  does  tl^rough  a  metal. 

O.K.,  now  we’re  getting  the  idea.  Just  to  round  things  out,  what  about  su 
perconductors?  They’ve  got  a  broken  gauge  symmetry  and  a  stiffness  to  decays  in 
the  superconducting  current.  What  is  the  low-energy  excitation?  It  doesn’t  have 
one.  But  what  about  Goldstone’s  theorem?  Well,  you  know  about  physicists  and 
theorems. . .. 

That’s  actually  quite  unfair:  Goldstone  surely  had  conditions  on  his  theorem 
which  excluded  superconductors.  Actually,  I  believe  Goldstone  was  studying  su¬ 
perconductors  when  he  came  up  with  his  theorem.  It’s  just  that  everybody  forgot 
the  extra  conditions,  eind  just  remembered  that  you  always  got  a  low  frequency 
mode  when  you  broke  a  continuous  symmetry.  We,  of  course,  understood  all  along 
why  tuere  isn’t  a  Goldstone  mode  for  superconductors:  it’s  related  to  the  Meissner 
effect.  The  high-energy  physicists  forgot,  though,  and  had  to  rediscover  it  for  them¬ 
selves.  Now  we  all  call  the  loophole  in  Goldstone’s  theorem  the  Higgs  mechanism, 
because  (to  be  truthful)  Higgs  and  his  high-energy  friends  found  a  much  simpler 
and  more  elegant  explanation  than  we  had.  We’ll  discuss  Meissner  effects  and  the 
Higgs  mechanism  in  another  chapter.'* 

I’d  like  to  end  this  section,  though,  by  bringing  up  another  exception  to  Gold- 
stone’s  theorem:  one  we’ve  known  about  even  longer,  but  which  we  don’t  have  a 
nice  explanation  for.  What  about  the  orientational  order  in  crystals?  Crystals  break 
both  the  continuous  translational  order  and  the  continuous  orientational  order.  The 
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FIGURE  1 0  Dislocation  in  a  crystal.  Here  is  a  topological  defect  in  a  crystal.  We  can 
see  that  one  of  the  rows  of  atoms  on  the  right  disappears  halfway  through  our  sample. 
The  place  where  it  disappears  is  a  defect,  because  it  doesn’t  locally  look  like  a  piece 
of  the  perfect  crystal.  It  is  a  topological  defect,  because  it  can’t  be  fixed  by  any  local 
rearrangement.  No  reshuffling  of  atoms  in  the  middle  of  the  sample  can  change  the  fact 
that  five  rows  enter  from  the  right,  and  only  four  leave  from  the  left!  The  Burger’s  vector 
of  a  dislocation  is  the  net  number  of  extra  rows  and  columns,  combined  into  a  vector 
(columns,  rows). 


phonons  are  the  Goldstone  modes  for  the  translations,  but  there  are  no  oneniattonal 
Goldstone  modes. [^1  We’ll  discuss  this  further  in  another  chapter,'*  but  I  think  this 
is  one  of  the  most  interesting  unsolved  basic  questions  in  the  subject. 


[din  two  dimensions,  crystals  provide  another  loophole  in  a  well-known  theorem.  Mermin  and 
Wagner  proved  meiny  years  ago  that  two-dimensional  systems  with  a  continuous  symmetry  cannot 
have  a  broken  symmetry  at  finite  temperature.  At  least,  that’s  the  English  phrase  everyone  quotes 
when  they  discuss  the  theorem:  the  theorem  is  stated  in  a  much  more  technical  way.  Now,  crystak 
in  two  dimensions  actually  don’t  break  the  translational  symmetry:  at  finite  temperatures,  the 
atoms  wiggle  enough  so  that  the  atoms  don’t  sit  in  lock-step  over  infinite  distances;  this  is  correctly 
stated  as  an  important  application  of  the  theorem.  But  the  crystals  do  have  a  broken  orientational 
symmetry:  the  crystal  axes  point  in  exactly  the  same  directions  throughout  space.  Again,  the 
theorem  has  technical  conditions  which  exclude  crystalline  orientational  order  in  the  presence  of 
translational  order. 
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FIGURE  11  Loop  around  the  dislocation,  mapped  onto  order  parameter  space.  How  do 
we  think  about  our  defect  in  terms  of  order  parameters  and  order  parameter  spaces? 
Consider  a  closed  loop  around  the  defect.  The  order  parameter  field  u  changes  as  we 
move  around  the  loop.  The  positions  of  the  atoms  around  the  loop  with  respect  to  their 
local  "ideal”  lattice  drifts  upward  continuously  as  we  traverse  the  loop.  This  precisely 
corresponds  to  a  loop  around  the  order  parameter  space;  the  loop  passes  once  through 
the  hole  in  the  torus.  A  loop  around  the  hole  corresponds  to  an  extra  column  of  atoms. 
Moving  the  atoms  slightly  will  deform  the  loop,  but  won’t  change  the  number  of  times 
the  loop  winds  through  or  around  the  hole.  Two  loops  which  traverse  the  torus  the 
same  number  of  times  through  and  around  are  equivalent.  The  equivalence  classes 
are  labelled  precisely  by  pairs  of  integers  (just  like  the  Burger’s  vectors),  and  the  first 
homotopy  group  of  the  torus  \s  Z  x  Z. 
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IV.  CLASSIFY  THE  TOPOLOGICAL  DEFECTS 

When  I  was  in  graduate  school,  the  big  fashion  was  topological  defects.  Everybody 
was  studying  homotopy  groups  and  finding  exotic  systems  to  write  papers  about. 
It  was,  in  the  end,  a  reasonable  thing  to  do.W  It  is  true  that  in  a  typical  application 
you’ll  be  able  to  figure  out  what  the  defects  are  without  homotopy  theory.  You'll 
spend  forever  drawing  pictures  to  convince  anyone  else,  though.  Most  important, 
homotopy  theory  helps  you  to  think  about  defects. 

A  defect  is  a  tear  in  the  order  parameter  field.  A  topological  defect  is  a  tear  that 
can’t  be  patched.  Consider  the  piece  of  two-dimensional  crystal  shown  in  Figure  10. 
Starting  in  the  middle  of  the  region  shown,  there  is  an  extra  row  of  atoms.  (This 
is  called  a  dislocation.)  Away  from  the  middle,  the  crystal  locally  looks  fine:  it’s 
a  little  distorted,  but  there  is  no  problem  seeing  the  square  grid  and  defining  an 
order  parameter.  Can  we  rearrange  the  atoms  in  a  small  region  around  the  start  of 
the  extra  row,  and  patch  the  defect? 

No.  The  problem  is  that  we  can  tell  there  is  an  extra  row  without  ever  coming 
near  to  the  center.  The  traditional  way  of  doing  this  is  to  traverse  a  large  loop 
surrounding  the  defect,  and  count  the  net  number  of  rows  crossed  on  the  path.  In 
the  path  shown,  there  are  two  rows  going  up  and  three  going  down:  no  matter  how 
far  we  stay  from  the  cetiter,  there  will  naturally  always  be  an  extra  row  on  the 
right. 

How  can  we  generalize  this  basic  idea  to  a  general  problem  with  a  broken  sym¬ 
metry?  Remember  that  the  order  parameter  space  for  the  two-dimensional  square 
crystal  is  a  torus  (see  Figure  7).  Remember  that  the  order  parameter  at  a  point 
is  that  translation  which  aligns  a  perfect  square  grid  to  the  deformed  grid  at  that 
point.  Now,  what  is  the  order  parameter  far  to  the  left  of  the  defect  (a),  compared 
to  the  value  far  to  the  right  (d)?  Clearly,  the  lattice  to  the  right  is  shifted  vertically 
by  half  a  lattice  constant:  the  order  parameter  has  been  shifted  halfway  around 
the  torus.  As  shown  in  Figure  11,  along  the  top  half  of  a  clockwise  loop,  the  order 
parameter  (position  of  the  atom  within  the  unit  cell)  moves  upward,  and  along 
the  bottom  half,  again  moves  upward.  All  in  all,  the  order  parameter  circles  once 
around  the  torus.  The  winding  nuniber  around  the  torus  is  the  net  number  of  times 
the  torus  is  circumnavigated  when  the  defect  is  orbited  once. 

This  is  why  they  are  called  topological  defects.  Topology  is  the  study  of  curves 
and  surfaces  where  bending  and  twisting  is  ignored.  An  order  parameter  field,  no 
matter  how  contorted,  which  doesn’t  wind  around  the  torus  can  always  be  smoothly 
bent  and  twisted  back  into  a  uniform  state.  If  along  any  loop,  though,  the  order 
parameter  winds  either  around  the  hole  or  through  it  a  net  number  of  times,  then 
enclosed  in  that  loop  is  a  defect  which  cannot  be  bent  or  twisted  flat:  the  winding 
number  can’t  change  by  an  integer  in  a  smooth  and  continuous  fashion. 

How  do  we  categorize  the  defects  for  two-dimensional  square  crystals?  Well, 
there  are  two  integers:  the  number  of  times  we  go  around  the  central  hole  and  the 

l®lThe  next  fashion,  catastrophe  theory,  never  became  important  for  anything. 
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number  of  times  we  pass  through  it.  In  the  traditional  description,  this  corresponds 
precisely  to  the  number  of  extra  rows  and  columns  of  atoms  we  pass  by.  This  was 
called  the  Burger's  vector  in  the  old  days,  and  nobody  needed  to  learn  about  tori 
to  understand  it.  We  now  call  it  the  first  homotopy  group  of  the  torus: 

ni(T2)  =  2x2  (6) 

where  Z  represents  the  integers.  That  is.  a  defect  is  labeled  by  two  integers  (rn.  n), 
where  m  represents  the  number  of  extra  rows  of  atoms  on  the  right-hand  part  of 
the  loop,  and  n  represents  the  number  of  extra  columns  of  atoms  on  the  bottom. 

Here’s  where  in  the  lecture  I  showed  the  practical  importance  of  topological 
defects.  Unfortunately  for  you,  I  can’t  enclose  a  soft  copper  tube  for  you  to  play 
with,  the  way  I  did  at  the  lecture.  They're  a  few  cents  each,  and  machinists  on  two 
continents  have  been  quite  happy  to  cut  them  up  for  my  demonstrations,  but  they 
don’t  pack  well  into  books.  Anyhow,  most  metals,  and  copper  in  particular,  exhibits 
what  is  called  work  hardening.  It’s  easy  to  bend  the  tube,  but  it’s  amazingly  tough 
to  bend  it  back.  The  soft  original  copper  is  relatively  defect-free.  To  bend,  the  crystal 
has  to  create  lots  of  line  dislocations,  which  move  around  to  produce  the  bending. 
The  line  defects  get  tangled  up  and  get  in  the  way  of  any  new  defects.  So,  when 
you  try  to  bend  the  tube-back,  the  metal  becomes  much  stiffer.  Work  hardening 
has  had  a  noticable  impact  on  the  popular  culture.  The  magician  effortlessly  bends 
the  metal  bar,  and  the  strongman  can’t  straighten  it. . ..  Superman  bends  the  rod 
into  a  pair  of  handcuffs  for  the  criminals. . .. 

Before  we  explain  why  these  curves  form  a  group,  let’s  give  some  more  ex¬ 
amples  of  topological  defects  and  how  they  can  be  classified.  Figure  12(a)  shows  a 
“hedgehog”  defect  for  a  magnet.  The  magnetization  simply  points  straight  out  from 
the  center  in  all  directions.  How  can  we  tell  that  there  is  a  defect,  always  staying 
far  away?  Since  this  is  a  point  defect  in  three  dimensions,  we  have  to  surround 
it  with  a  sphere.  As  we  move  around  on  this  sphere  in  ordinary  space,  the  order 
parameter  moves  around  the  order  parameter  space  (which  also  happens  to  be  a 
sphere,  of  radius  \M\).  In  fact,  the  order  parameter  space  is  covered  exactly  once 
as  we  surround  the  defect.  This  is  called  the  wrapping  number  and  doesn’t  change 
as  we  wiggle  the  magnetization  in  smooth  ways.  The  point  defects  of  magnets  are 
classified  by  the  wrapping  number; 


n^is^)  =  z. 


(7) 


t^lThis  again  is  the  mysterious  lack  of  ^otation^J  Goldstone  modes  in  crystals. 
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FIGURE  12  (a)  Hedgehog  defect.  Magnets  have  no  line  defects  (you  cant  lasso  a 
basketball),  but  do  have  point  defects.  Here  is  shown  the  hedgehog  defect,  M(x)  = 
Mo  X.  You  cani  surround  a  point  defect  in  three  dimensions  with  a  loop,  but  you  can 
enclose  it  in  a  sphere.  The  order  parameter  space,  remember,  is  also  a  sphere.  The 
order  parameter  field  takes  the  enclosing  sphere  and  maps  it  onto  the  order  parameter 
space,  wrapping  it  exactly  once.  The  point  defects  in  magnets  are  categorized  by 
this  wrapping  number,  the  second  Homotopy  group  of  the  sphere  is  Z,  the  integers, 
(b)  Defect  line  in  a  nematic  liquid  crystal.  You  can’t  lasso  the  sphere,  but  you  can  lasso 
a  hemisphere!  Here  is  the  defect  corresponding  to  the  path  shown  in  Rgure  5.  As  you 
pass  clockwise  around  the  defect  line,  the  order  parameter  rotates  counterclockwise 
by  180°  This  path  on  Figure  5  would  actually  have  wrapped  around  the  right-hand 
side  of  the  hemisphere.  Wrapping  around  the  left-hand  side  would  have  produced  a 
defect  which  rotated  clockwise  by  180°.  (Imagine  that!)  The  path  in  Figure  5  is  halfway 
in  between,  and  illustrates  that  these  two  defects  are  really  not  different  topologically. 


Here,  the  2  subscript  says  that  we’re  studying  the  second  Homotopy  group.  It  rep¬ 
resents  the  fact  that  we  are  surrounding  the  defect  with  a  two-dimensional  spherical 
surface,  rather  than  the  one-dimensional  curve  we  used  in  the  crystal. 

You  might  get  the  impression  that  a  strength  7  defect  is  really  just  seven 
strength  1  defects,  stuffed  together.  You’d  be  quite  right:  occasionally,  they  do 
bunch  up,  but  usually  big  ones  decompose  into  small  ones.  This  doesn’t  mean, 
though,  that  adding  two  defects  always  gives  a  bigger  one.  In  nematic  liquid  crystals, 

(u>lThe  zeroth  homotopy  group  classifies  domain  walls.  The  third  homotopy  group,  applied  to 
defects  in  three-dimensional  materials,  classifies  what  the  condensed  matter  people  call  textures 
and  the  particle  people  sometimes  call  skyrmions.  The  fourth  homotopy  group,  applied  to  defects 
in  space-time  path  integrals,  classifies  types  of  instantons. 
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two  line  defects  are  as  good  as  none!  Magnets  didn’t  have  any  line  defects:  a  loop 
in  real  space  never  surrounds  something  it  can’t  smooth  out.  Formally,  the  first 
homotopy  group  of  the  sphere  is  zero:  you  can’t  loop  a  basketball.  For  a  nematic 
liquid  crystal,  though,  the  order  parameter  space  was  a  hemisphere  (Figure  5). 
There  is  a  loop  on  the  hemisphere  in  Figure  5  that  you  can't  get  rid  of  by  twisting 
and  stretching.  It  doesn’t  look  like  a  loop,  but  you  have  to  remember  that  the  two 
opposing  points  on  the  equator  really  represent  the  same  nematic  orientation.  The 
corresponding  defect  has  a  director  field  n  wh>''h  rotates  180°  as  the  defect  is  orbited: 
Figure  12(b)  shows  one  typical  configuration  (called  an  s  =  — 1/2  defect).  Now,  if 
you  put  two  of  these  defects  together,  they  cancel.  (1  can't  draw  the  pictures,  but 
consider  it  a  challenging  exercise  in  geometric  visualization.)  Nematic  line  defects 
add  modulo  2,  like  clock  arithmetic  in  elementary  school: 

^,(7^7’')  =  ^2.  (8) 

Two  parallel  defects  can  coalesce  and  heal,  even  though  each  one  individually  is 
stable:  each  goes  halfway  around  the  sphere,  and  the  whole  loop  can  be  shrunk  to 
zero. 


FIGURE  13  Multiplying  two  loops.  The  product  of  two  loops  is  given  by  starting  from 
their  intersection,  traversing  the  first  loop,  and  then  traversing  the  second.  The  inverse 
of  a  loop  is  clearly  the  same  loop  travelled  backward:  compose  the  two,  and  one  can 
shrink  them  continuously  back  to  nothing.  This  definition  makes  the  homotopy  classes 
into  a  group.  This  multiplication  law  has  a  physical  interpretation.  If  two  defect  lines 
ccalecco,  their  homotopy  class  must,  of  course,  be  given  by  the  loop  enctosing  both. 
This  large  loop  can  be  deformed  into  two  little  loops,  so  the  homotopy  class  of  the 
coalesced  line  defect  is  the  product  of  the  homotopy  classes  of  the  iridividual  defects. 


FIGURE  14  Defect  entanglement,  (a)  Can  a  defect  line  of  class  a  pass  by  a  line  of 
class  /3,  without  getting  topologically  entangled?  (b)  We  see  that  we  can  pass  by  if  we 
leave  a  trail:  is  the  connecting  double  line  topologically  trivial?  Encircle  the  double  line 
by  a  loop.  The  loop  can  be  wiggled  and  twisted  off  the  double  line,  but  it  still  circles 
around  the  two  legs  of  the  defects  a  and  0.  (c)  The  homotopy  class  of  the  loop  is 
precisely  0a0~^a~\  which  is  trivial  precisely  when  0q  =  a0.  Thus  two  defect  lines 
can  pass  by  one  another  if  their  homotopy  classes  commute! 
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Finally,  why  are  these  defect  categories  a  group?  A  group  is  a  set  with  a  mul¬ 
tiplication  law,  not  necessarily  commutative,  and  an  inverse  for  each  element.  For 
the  first  homotopy  group,  the  elements  of  the  group  are  equivalence  classes  of  loops: 
two  loops  are  equivalent  if  one  can  be  stretched  and  twisted  onto  the  other,  staying 
on  the  manifold  at  all  times.b’l  For  example,  any  loop  going  through  the  hole  from 
the  top  (as  in  the  top  right-hand  torus  in  Figure  13)  is  equivalent  to  any  other  one. 
To  multiply  a  loop  u  and  a  loop  v,  one  must  first  make  sure  that  they  meet  at  some 
point  (by  dragging  them  together,  probably).  Then  one  defines  a  new  loop  u  0  v 
by  traversing  first  the  loop  u  and  then 

The  inverse  of  a  loop  u  is  just  the  loop  which  runs  along  the  same  path  in 
the  reverse  direction.  The  identity  element  consists  of  the  equivalence  class  of  loops 
which  don’t  enclose  a  hole:  they  can  all  be  contracted  smoothly  to  a  point  (and  thus 
to  one  another).  Finally,  the  multiplication  law  has  a  direct  physical  implication: 
encircling  two  defect  lines  of  strength  u  and  v  is  completely  equivalent  to  encircling 
one  defect  of  strength  u®  v. 

This  all  seems  pretty  trivial:  maybe  thinking  about  order  parameter  spaces  auid 
loops  helps  one  think  more  clearly,  but  are  there  any  real  uses  for  talking  about  the 
group  structure?  Let  me  conclude  this  paper  with  an  amazing,  physically  interesting 
consequence  of  the  multiplication  laws  we  described.  There  is  a  fine  discussion  of 
this  in  Mermin’s  article,^  but  I  learned  about  it  from  Dan  Stein’s  thesis.® 

Can  two  defect  lines  cross  one  another?  Figure  14(a)  shows  two  defect  lines, 
of  strength  (homotopy  type)  a  and  l3,  which  are  not  parallel.  Suppose  there  is  an 
external  force  pulling  the  a  defect  past  the  /?  one.  Clearly,  if  we  bend  and  stretch 
the  defect  as  shown  in  Figure  14(b),  it  can  pass  by,  but  there  is  a  trail  left  behind, 
of  two  defect  lines,  a  can  really  leave  f3  behind  only  if  it  is  topologically  possible  to 
erase  the  trail.  Can  the  two  lines  annihilate  one  another?  Only  if  their  net  strength 
is  zero,  as  measured  by  the  loop  in  Figure  14(b). 

Now,  get  two  wires  and  some  string.  Bend  the  wires  into  the  shape  found  in 
Figure  14(b).  Tie  the  string  into  a  fairly  large  loop,  surrounding  the  doubled  portion. 
Wiggle  the  string  around,  and  try  to  get  the  string  out  from  around  the  doubled 
section.  You’ll  find  that  you  can’t  completely  remove  the  string  (no  fair  pulling  the 
string  past  the  cut  ends  of  the  defect  lines!),  but  that  you  can  slide  it  downward 
into  the  configuration  shown  in  Figufe  14(c). 

Now,  in  Figure  14(c)  we  see  that  each  wire  is  encircled  once  clockwise  and 
once  counterclockwise.  Don’t  they  cancel?  Not  necessarily!  If  you  look  carefully, 
the  order  of  traversal  is  such  that  the  net  homotopy  class  is  0al3~^a~^,  which  is 
only  the  identity  if  13  and  a  commute.  Thus  the  physical  entanglement  problem 


A  loop  is  a  continuous  mapping  from  the  circle  into  the  order  parameter  space:  6  — ►  ti(fl),  0  < 
d  <  2'ir.  When  we  encircle  the  defect  with  a  loop,  we  get  a  loop  in  order  parameter  sptice  as  shown 
in  Figure  4:  8  —*  £{8)  is  the  loop  in  real  space,  and  9  —*  u(x(^))  is  the  loop  in  order  parameter 
space.  Two  loops  are  equivalent  if  there  is  a  continuous  one-parameter  family  of  loops  connecting 
one  to  the  other:  u  =  i;  if  there  exists  ut{S)  continuous  both  in  9  and  in  0  <  t  <  1,  with  uo  =  u 
and  ui  =  V. 

b^lThat  is,  u  ®  i’{9)  =  u(29)  for  0  <  5  <  tt,  and  =  v{29)  for  <  0  <  2jr. 
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for  defects  is  directly  connected  to  the  group  structure  of  the  loops:  commutative 
defects  can  pass  through  one  another;  noncommutative  defects  enttuigle. 

I’d  like  to  be  able  to  tell  you  that  the  work  hardening  in  copper  is  due  to 
topological  entanglements  of  defects.  It  wouldn’t  be  true.  The  homotopy  group  of 
dislocation  lines  in  fee  copper  is  commutative.  (It’s  rather  like  the  two-dimensional 
square  lattice:  if  a  =  (m,n)  and  0  =  (o,  p)  with  m.n,o,p  the  number  of  extra 
horizontal  and  vertical  lines  of  atoms,  then  a0  =  [m  +  o,n  +  p)  =  0a.)  The  reason 
dislocation  lines  in  copper  don’t  pass  through  one  another  is  energetic,  not  topolog¬ 
ical.  The  two  dislocation  lines  interact  strongly  with  one  ajtother,  and  energetically 
get  stuck  when  they  try  to  cross.  Remember  at  the  beginning  of  the  paper,  I  said 
that  there  were  gaps  in  the  system:  the  topological  theory  can  only  say  when  things 
are  impossible  to  do,  not  when  they  are  difficult  to  do. 

I’d  like  to  be  able  to  tell  you  that  this  beautiful  connection  between  the  commu¬ 
tativity  of  the  group  and  the  entanglement  of  defect  lines  is  nonetheless  important 
in  lots  of  other  contexts.  That,  too,  would  not  be  true.  There  are  two  types  of 
materials  I  know  of  which  are  supposed  to  suffer  from  defect  lines  which  topo¬ 
logical  entangle.  The  first  are  biaxial  nematics,  which  were  thoroughly  analyzed 
theoretically  before  anyone  found  one.  The  other  are  the  metallic  glasses,  where 
David  Nelson  has  a  theory  of  defect  lines  needed  to  relieve  the  frustration.  We'll 
discuss  closely  related  theories  in  section  3.  Nelson’s  defects  don’t  commute,  and  so 
can’t  cross  one  another.  He  originally  hoped  to  explain  the  freezing  of  the  metallic 
glasses  into  random  configurations  as  an  entanglement  of  defect  lines.  Nobody  has 
ever  been  able  to  take  this  idea  and  turn  it  into  a  real  calculation,  though. 

Enough,  then,  of  the  beautiful  and  elegant  world  of  homotopy  theory:  let’s 
begin  to  think  about  what  order  parameter  configurations  are  actually  formed  in 
practice. 
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Meissner  Effects  and  Constraints 


In  the  last  paper, I  explained  how  condensed-matter  and  high-energy 
physicists  used  topological  theories  to  describe  defects  excitations  in  solids. 
In  this  paper,  I’m  going  to  make  fun  of  topology. Actucilly,  I’m  going 
to  start  by  talking  about  constraints,  then  “massive”  fields  and  how  they 
produce  constraints.  I’ll  then  turn  to  the  Meissner-Higgs  effect  in  super¬ 
conductors,  and  finally  explain  why  I  don’t  understand  crystals. 


I.  CONSTRAINTS 

Consider  Figure  1.  See  the  beautiful  ellipses  and  hyperbolas?  Remember  that  topol¬ 
ogy  treats  ellipses  as  rubber  bands.  Any  topological  theory  has  got  to  miss  the  key 


^^lEverything  I  know  about  focal  conics  and  smectic  liquid  crystals^  was  explained  to  me  by 
Maurice  Kleman,  who  also  was  one  of  the  originators  of  the  topological  theory  of  defects.  No 
disrespect  is  intended. 


1991  Lectures  in  Complex  Systems,  SFI  Studies  in  the  Sciences  of  Complexity, 

Lect.  Vol.  IV,  Eds.  L.  Nadel  &  D.  Stein,  Addison-Wesley,  1992  2  67 
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FIGURE  1  Ellipses:  Defects  in  a  Liquid  Crystal.  This  is  a  drop  of  smectic  A  liquid 
crystal,  squeezed  between  two  microscope  slides.  The  microscope  is  focused  on 
the  surface  of  the  drop,  where  it  contacts  the  glass.  Notice  the  beautiful,  geometrical 
ellipses.  Notice  that  a  line  seems  to  exit  from  the  focus  of  each  ellipse.  This  line  turns 
out  to  be  a  hyperbola  (Figure  4).  The  visible  ellipses  and  the  hyperbolas  are  where 
the  smectic  layers  pinch  off  to  form  cusps.  These  defects  are  not  topological:  they  are 
geometrical  consequences  of  the  constraint  of  equal  layer  spacing.  From  Ref.  3,  Figure 
7.2,  photo  by  C.  Williams. 


feature  of  the  beautiful  structures  produced  here;  the  geometrically  perfect  ellipses 
with  dark  lines  coming  out  of  one  focus. 

Figure  1  is  a  photograph  of  a  drop  of  fluid,  squeezed  between  two  microscope 
slides.  The  microscope  is  focused,  let’s  say,  on  the  surface  between  the  fluid  and 
the  bottom  microscope  slide:  the  ellipses  cire  stuck  onto  the  glass.  The  sizes  of  the 
ellipses  are  roughly  given  by  the  thickness  of  the  fluid  layer.  The  fluid  is  a  smectic  A 
liquid  crystal.  deGennes^  has  a  fine  discussion  and  some  nice  pictures,  too. 
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FIGURE  2  Order  in  Smectic  Liquid  Crystals.  Smectic  liquid  crystals  are  formed  of 
layers  of  molecules.  In  each  layer,  the  molecules  are  in  a  random,  liquid  configuration. 
Crystals  have  broken  translational  symmetry  along  three  independent  cues;  smectic  A 
liquid  crystals  have  broken  translational  symmetry  in  only  one  direction  (normal  to  the 
layers). 


In  1010,  Friedel  figured  out  why  this  liquid  crystal  forms  these  geometrical 
structures.  He  learned  all  he  needed  to  know  from  his  high-school  geometry  class. 
He  actually  worked  backward,  and  used  the  ellipses  to  deduce  what  kind  of  broken 
symmetry  the  liquid  had.  Since  none  of  you  were  taught  about  the  cyclides  of  Dupin 
in  high  school,!*'  I’d  better  start  with  the  broken  symmetry  and  work  forward. 

Smectic  liquids  form  equally  spaced  layers.  Some  of  them  are  compounds  that, 
like  soap,  naturally  form  membranes  and  films:  I  think  smectic  is  the  Greek  word 
for  soap  Others  are  long  thin  molecules  like  nematics,  which  for  some  reason  not 
only  line  up  but  segregate  into  planes  (Figure  2).  The  molecules  have  liquid-like 
order  in  the  planes.  Like  crystals,  they  have  a  broken  translational  symmetry,  but 
only  in  one  of  the  three  directions. 

Now,  the  important  excitations  for  smectics  are  those  that  bend  the  layers.  In 
Figure  3,  we  see  a  two-dimensional  analogue  of  the  smectic  liquid  crystals:  equally 
spaced  curves  in  the  plane.  Suppose  we  start  with  one  curve  and  work  outward. 
As  you  can  see  from  the  figure,  the  next  curve  is  not  precisely  the  same  shape: 
keeping  the  surfaces  at  an  equal  spacing  makes  concave  regions  become  sharper  and 
convex  regions  become  more  rounded.  It  is  easy  to  see  that  eventually  the  concave 
regions  will  become  pinched:  these  pinches  are  the  defects.  They  are  not  topological 
defects,  since  rounding  them  a  bit  makes  them  go  away:  they  are  geometrical  defects 
produced  by  the  constraint  of  equal  layer  spacing. 

1^1  Ber*  rand  Fourcade  tflls  me  that  even  the  French  .stopped  teaching  them. 
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FIGURE  3  Equally  Spaced  Layers;  Defect  Formation.  Hers  we  consider  a  two- 
dimensional  analogue  of  a  smectic  liquid  crystal.  The  smectic  layers  are  represented 
by  curves  in  the  plane.  The  lowest  energy  state,  of  course,  consists  of  parallel  straight 
layers,  but  the  layers  often  settle  into  more  complicated  patterns,  with  defects.  For 
reasons  that  we  discuss  in  this  paper,  and  which  are  not  completely  understood, 
smectic  layers  will  deform  by  bending,  but  will  remain  strictly  equally  spaced  (except 
very  near  boundaries  and  defects).  The  constraint  of  equal  layer  spacing  has  weird 
nonlocal  consequences.  First,  one  can  see  that  as  one  moves  outward  the  concave 
regions  become  more  pinched,  and  eventually  form  cusps.  Second,  one  can  see  that 
a  line  perpendicular  to  one  layer  (a  generator)  will  be  perpendicular  to  the  next  one, 
too.  These  generators  intersect  on  a  surface  known  as  the  e volute,  and  it  is  when  the 
layers  hit  the  evolute  that  a  defect  occurs.  As  one  sees  here,  the  defect  is  a  line  of 
pinched  surfaces:  in  three  dimensions  it  is  typically  a  two-dimensional  surface.  This 
costs  lots  of  energy.  The  only  way  in  two  dimensions  to  have  a  point-like  low-energy 
defect  is  to  have  concentric  circles:  only  circles  have  zero-dimensional  evolutes.  The 
only  way  in  three  dimensions  to  have  one-dimensional  evolutes^  is  to  have  cyclkjes 
of  Dupin;  the  defects  are  ellipses  and  hyperbolas  passing  through  one  another's  foci 
(Figures  1  and  4). 
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FIGURE  4  Focal  Conic  Defect.  Here  we  see  the  smectic  surfaces  which  form  the 
focal  conic  defects  seen  in  figure  1.  These  are  the  cyclides  of  Dupin.  The  surfaces 
go  from  banana-shaped  to  squashed  doughnuts  to  apple-shaped.  The  points  on  the 
bananas  and  the  dimples  at  the  stem  and  bottom  of  the  apples  are  defects,  which 
scatter  light  and  show  up  in  Figure  1.  (Only  the  dimples  of  the  apple  are  shown.) 
The  banana  defects  lie  on  an  ellipse,  and  the  apple  defects  lie  on  a  hyperbola  which 
passes  through  the  focus  of  the  ellipse.  Usually,  the  whole  pattern  isn’t  found  in  the 
experimental  sample.  As  you  see  in  Figure  1 ,  the  domains  aggregate  together  in 
clumps.  Each  ellipse  in  Figure  1  has  a  conical  region  for  its  smectic  layers. 


Most  curves,  like  the  one  shown  in  Figure  3,  form  one-dimensional  pinched  re¬ 
gions;  only  concentric  circles  and  structures  made  from  them  can  keep  the  pinched 
regions  to  points.  In  three  dimensions,  the  only  equally  spaced  surfaces  with  points 
cis  pinched  regions  are  concentric  spheres.  Now,  what  Friedel  knew  and  you  don’t 
know  is  that  the  only  three-dimensional  surfaces  with  one-dimensional  line-like 
defects  are  the  cyclides  of  Dupin,'’  and  the  pinched  regions  form  ellipses  and  hyper¬ 
bolas}^^ 

Figure  4  shows  the  cyclides  of  Dupin.  Notice  that  they  pinch  off  on  two  curves: 
an  ellipse  and  a  hyperbola.  The  hyperbola  is  perpendicular  to  the  plane  of  the 
ellipse,  and  passes  through  its  focus.  That’s  what  you  see  streaming  out  of  the 
foci  in  the  photo,  and  why  you  don’t  see  one  for  each  focus.  My  contribution  to 


Actually,  the  canal  surfaK:es  also  have  singularities  confined  to  one-dimensioneJ  regions,^ but 
let’s  not  get  bogged  down. 
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FIGURE  5  Focal  Conic  Defect  Meshing  onto  Concentric  Spheres.  The  conical  regions 
in  Figure  4  combine  into  compound  defects  by  meshing  onto  the  concentric  sphere 
defect.  Concentric  spheres  are  the  only  surfaces  with  zero-dimensional  defects.  The 
surfaces  on  the  edges  of  the  cones  mesh  smoothly  onto  the  concentric  spheres. 


the  field  (with  Maurice  Kleman)  was  to  realize  that  these  cyclides  of  Dupin  fit  to¬ 
gether  nicely  inside  concentric  spheres,  which  explained  neatly  the  ways  the  ellipses 
always  seemed  to  fit  together  (Figure  5).  Maybe  the  concentric  spheres  form  be¬ 
cause  the  layers  nucleate  on  a  dust  particle  on  one  of  the  microscope  slides:  when 
the  spheres  touch  the  other  slide,  the  concentric  spheres  get  twisted  (they  like  to 
sit  perpendicular  to  the  glass)  and  the  ellipses  and  hyperbolas  form  to  relieve  the 
strain. 

Now,  why  do  I  show  you  this?  It  isn’t  just  to  show  that  there  is  more  to  the  world 
than  topology.  Mostly,  it’s  to  illustrate  the  two  themes  of  this  paper:  constraints 
and  expulsion. 

If  we  define  an  order  parameter  n  for  the  smectic  to  be  the  unit  normal  to 
the  smectic  layers  (n^  =  1),  then  the  constraint  that  the  layers  be  equally  spaced 
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implies 

/  dn^jdy  -  driyfdz  \ 

curln=  j  dnxjdz  —  drizldx  j  =0.  (1) 

\dnyfdx  —  drix/dy  j 

(This  is  derived,  for  those  who  know  a  bit  about  vector  calculus,  in  the  Appendix.) 
This  is  a  remarkably  powerful  constraint.  For  example,  knowing  the  position  of  one 
layer  determines  all  the  others!  We  show  this  mathematically  in  the  Appendix,  but 
you  saw  it  physically  in  Figure  3:  given  one  layer,  there  is  only  one  way  to  place 
the  next  one  preserving  exactly  equal  spacing. 

There  is  a  pretty  good  analogy  here  to  analytic  continuation.  For  those  of  you 
who  know  about  complex  analysis,  you  know  that  an  analytic  function  obeys  the 
Cauchy- Riemann  equations.  If  we  let  n(x  -f-  iy)  =  n^ix  -f-  iy)  -b  iny{x  -b  iy)  be  an 
analytic  function,  then 

f  dux/dx  -  duy/dy  \ 
ydrix/dy  +  driy/dx  ) 

As  you  know,  analytic  functions  have  really  bizarre  properties.  If  you  know  an 
analytic  function  in  a  small  region,  you  can  figure  it  out  everywhere  else,  just  like 
the  order  parameter  in  smectics.  The  point  singularities  of  analytic  functions  have 
a  rich  and  interesting  classification  (simple  poles,  essential  singularities,...).  Both 
in  analytic  functions  and  in  our  smectic  problem,  constraints  on  the  derivatives  of 
our  order  parameters  produced  really  bizarre,  nonlocal,  geometrical  consequences. 


II.  MASSIVE  FIELDS 

We’ve  discovered  that  constraints  can  have  beautiful,  geometrical  consequences. 
How  are  the  constraints  enforced?  Clearly,  it  is  possible  to  stretch  the  smectic 
layers  apart  or  to  compress  them  together;  why  doesn't  this  happen  in  practice, 
especially  when  the  layers  are  being  bent  and  twisted?  The  curl  of  n  is  constrained 
to  zero.  Why  are  magnetic  fields  pushed  completely  out  of  superconductors?  The 
magnetic  field  is  constrained  to  zero.  Why  isn’t  it  possible  to  find  an  isolated  quark 
in  nature?  Quarks  have  non-zero  “color,”  and  the  net  color  is  constrained  to  zero. 

These  constraints  come  from  minimizing  the  energy.  Saying  that  magnetic  fields 
can  happen  inside  superconductors  is  just  like  saying  that  marbles  can  sit  on  the 
side  of  a  hill;  it  can  happen,  but  not  if  the  marbles  are  allowed  to  roll  to  minimize 
their  energy.  Under  what  conditions  does  the  energy  enforce  a  constraint?  We  say 
that  it  happens  when  the  order  parameter  field  develops  a  mass.  We’ll  explain  this 
term  in  a  moment,  but  let’s  first  give  a  simple  example. 

Suppose  we  have  a  fluid  in  one  dimension.  The  density  of  a  fluid  is  the  important 
variable  in  describing  its  state.  Suppose  the'density  of  the  fluid  is  po  +  p{x),  where 
po  is  the  ideal  density  (which  the  fluid  would  have  if  left  to  itself)  and  the  order 
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parameter  p{x)  describes  the  deviation  from  the  ideal  density.  A  sensible  free  energy 
might  be 

=  j dx{\l2){dpldx)'^  +  {\l2)mp- .  (3) 

The  first  term  in  the  energy  resists  sudden  changes  in  the  density:  having  a  high- 
density  region  right  next  to  a  low-density  region  costs  extra.  The  second  term  in  the 
energy  says  that  deviations  from  the  mean  density  cost  energy,  with  m  a  coefficient 
which  says  how  much  deviations  cost.  Unlike  phonons,  where  the  order  parameter 
u(x)  could  be  uniformly  shifted  without  energy  cost,  here  the  lowest-energy  state 
happens  when  the  density  is  at  its  mean  value  p(x)  =  0. 

What  happens  when  we  try  to  find  the  minimum  energy  state?  Clearly  the 
best  we  can  get  is  the  ideal  state  p{x)  =  0,  which  has  zero  energy  £fluid-  Perhaps, 
though,  we’re  pulling  on  the  density  at  the  two  ends  (Figure  6).  If  the  liquid  is  in  a 
trough  of  length  L,  we’ll  insist  that  p(0)  =  pi  and  p(L)  =  pj.  What  configuration 
p(x)  minimizes  the  energy  then?  Clearly,  it  should  sag  towards  po  inside,  but  how? 

Here  I’ll  show  you  a  simple  case  of  what’s  called  the  calculus  of  variations.  I 
apologize  for  the  math,  but  it  is  really  a  useful  method.  The  trick  is  to  realize  that 
if  p(x)  is  the  minimum  energy  configuration,  then  p(x)  -f  6{x)  must  have  a  higher 
energy,  whatever  6(x)  we  might  choose. 

Up  +  S)-  £{p)  +  mp^xnx)  +  (i)  (^)’  +  rfi  >  0 .  (4) 


FIGURE  6  Massive  Fields  Decay  Exponentially.  Minimizing  the  energy  £fluid 
equation  (liquid),  with  boundary  conditions  p(0)  =  p,-  and  p(L)  =  p/.  It  is  easy  to 
understand  physically  what  is  happening.  The  system  wants  to  achieve  p  =  0,  and  it 
sags  to  that  value  as  quickly  as  it  can,  balancing  the  costs  of  (dp/dx)^  energy  against 
the  gain.  The  solution  decays  exponentially  to  zero  with  a  decay  constant  y/m. 
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Now,  if  we  confine  our  attention  to  small  we  can  ignore  the  last  two  terms 

(because  they  are  quadratic,  rather  than  linear,  in  6).  The  first  term  we  integrate 
by  parts,  so 


Jq  ax  dx  \  dx  J  0  Jq 


dx  6  d~p 
dx^- 


(5) 


.Now,  6  mustn’t  change  the  values  at  the  endpoints,  so  6(0)  =  6(L)  =  0,  and  the 
boundary  terms  in  Eq.  (5)  drop  out.  We're  left,  then,  with  the  equation 

-d^p 


-  ^(P)  ^  J  dx  > 


0. 


(6) 


Now,  this  must  be  true  for  any  S(x)  we  choose.  This  can  only  happen  if  —d^pfdx~  + 
mp{x)  =  0,  so  p”  =  mp. 

The  solutions  to  this  equation  are.  of  course,  p  =  Ae~'^^  +  We  can 

vary  the  arbitrary  constants  A  and  B  to  match  the  boundary  conditions  p(0)  =  Pi 
and  p{L)  =  pj,  and  we  see  (Figure  6)  that  p  is  expelled  from  the  interior,  pulling 
it  on  the  boundary  only  affects  a  region  of  length  y/m,  and  the  order  parameter 
exponentially  decays  into  the  bulk,  p  is  constrained  to  zero  in  the  inside  of  the 
sample! 


FIGURE  7  Superconductors  Expel  Magnetic  Fields.  A  magnetic  field  passing 
through  a  metal  will  be  pushed  out  when  the  metal  is  cooled  through  its 
superconducting  transitions  temperature.  This  can  happen  in  two  different  ways. 

In  type  I  superconductors  like  lead  (chemical  symbol  Pb),  the  superconductivity  is 
pushed  entirely  outside  the  sample.  In  type  II  superconductors  like  niobium  (Nb),  the 
magnetic  field  is  broken  up  and  confined  to  defect  lines  called  vortices.  In  both  cases, 
the  magnetic  field  is  swept  out  of  the  remainder  of  the  sample.  The  magnetic  field 
penetrates  a  distance  A  ~  lOOA  into  the  sample  from  the  boundaries  or  from  the 
vortex  lines. 
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FIGURE  8  (a)  Superfluid  Free  Energy,  T  >  Tc\  Unbroken  Symmetry.  The  free 
energy  for  a  normal  metal  or  fluid,  above  the  superconducting  or  superfluid  transition 
temperature,  for  a  uniform  order  parameter  field  i>.  The  vertical  axis  represents  the 
energy  a|V’P  +  and  the  horizontal  axes  represent  the  real  and  imaginary  parts 

of  xl>.  The  coefficient  o  >  0,  so  the  minimum  of  the  energy  is  at  i/>  =  0.  Notice  that  the 
energy  is  invariant  under  the  symmetry  ip  e'^rp  (corresponding  to  rotating  the  figure 
about  the  vertical).  This  is  a  symmetry  of  the  free  energy.  Notice  also  that  the  lowest 
energy  state  V'  =  0  is  also^mchanged  by  this  rotation:  the  symmetry  is  unbroken  above 
Tc.  (b)  Superfluid  Free  Energy:  T  <Tc.  Broken  Symmetry.  The  free  energy  ^’superfluid 
for  helium  below  the  superfluid  transition  temperature.  The  energy  now  looks  like  a 
Mexican  hat:  it  is  still  invariant  under  rotations  about  the  vertical  axis.  Since  now  o  <  0, 
the  energy  is  at  a  minimum  along  a  circle,  of  radius  =  \/q/20  and  arbitrary  phase 
6.  The  superfluid  must  choose  between  these  various  possible  phases,  and  that  choice 
breaks  the  symmetry.  This  is  a  good  example  of  spontaneous  symmetry  breaking:  just 
as  the  magnetization  of  a  magnet  selects  a  direction  in  space  and  breaks  rotational 
invariance,  the  superconductor  picks  out  a  value  of  0. 


Why  do  we  call  this  a  mass?  The  name  comes  from  particle  physics.  The  photon 
is  massless.  Two  charges  ei  and  €2  separated  by  a  distance  r  interact  by  a  force 
whose  magnitude  goes  as  eie2/r^:  this  is  Coulomb’s  law.  The  particle  physicists 
interpret  this  force  in  terms  of  the  two  particles  exchanging  “virtual”  photons.  (I 
think  of  the  l/r^  decay  as  the  virtual  photons  being  diluted  over  a  sphere  of  radius 
r.)  Now,  the  strong  interaction  between  protons  and  neutrons  has  a  different  form; 
the  force  between  them  is  always  attrcictive,  and  goes  as  The  exponential 

decay  is  extremely  important,  since  it  keeps  the  nuclei  of  different  atoms  from 
attracting  one  another.  (We’d  all  have  collapsed  into  neutron  stars  or  worse  were 
it  not  there!)  At  long  distances,  the  particle  physicists  interpret  this  force  as  the 
proton  and  neutron  exchanging  virtual  pions.bl  Since  the  pion  isn’t  massless,  the 

1^1  At  shorter  distances,  the  picture  is  quarks  exchanging  gluons.  The  gluons  have  color,  though, 
so  the  proton  and  neutron  can’t  exchange  them  at  long  distances.  Since  colorless  glueballs,  if 
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virtual  pion  field  decays  exponentially  for  exactly  the  same  reason  that  p(x)  decayed 
in  our  example  above. 

So,  to  enforce  a  constraint,  we  need  to  give  the  corresponding  field  a  mass.  Let’s 
see  how  that  is  done. 


Hi.  THE  MEISSNER-HiGGS  EFFECT 

In  this  section,  I  want  to  explain  how  superconductors  expel  magnetic  field.  This  is  a 
really  beautiful  argument,  which  I’ve  basically  taken  from  Coleman’s  presentation.^ 
I’m  afraid  that  there  is  some  math  and  a  lot  of  physics  that  I  need  to  introduce. 
Most  of  you  will  get  lost  at  some  point:  skip  onto  the  next  section  when  you  tire 
of  this  one. 

A.  INTRODUCTION  TO  THE  MEISSNER  EFFECT.  Superconductors  are  named  for 
their  ability  to  carry  currents  of  electricity  with  absolutely  no  losses.  They  have 
another,  closely  related  property  which  is  no  less  amazing:  they  are  a  perfect  shield 
for  magnetic  fields.  Remember  the  old  science  fiction  stories  about  the  scientist  who 
finds  a  material  which  is  impervious  to  the  gravitational  field,  paints  the  bottom  of 
his  spacecraft  with  it,  and  falls  to  the  moon?  Superconductors  work  that  way  for 
magnetic  fields. 

Ashcroft  and  Mermin  have  a  nice,  not  too  technical  discussion  of  superconduc¬ 
tors  in  one  of  the  last  chapters  in  their  textbook.*  Figure  7  shows  the  two  types 
of  superconductors,  represented  by  lead  and  niobium.  At  high  temperatures,  when 
the  materials  aien’t  superconducting,  the  magnetic  field  penetrates  the  materials 
almost  as  if  they  weren’t  there.  (Iron  would  pull  the  magnetic  field  lines  inward.) 
Lead,  when  superconducting,  pushes  the  magnetic  field  out:  just  as  for  the  ex¬ 
ample  in  section  II,  the  field  a  distance  r  inward  from  the  boundary  decays  like 
B  =  5oe~'’/A  If  you  put  too  high  a  field,  the  lead  will  give  up  and  let  the  field  in, 
but  it  will  stop  superconducting. 

On  the  right,  we  see  that  niobium  behaves  a  bit  differently.  It  expels  small 
magnetic  fields  like  lead  does,  but  larger  fields  are  pushed  into  thin  threads,  called 
vortex  lines.  These  two  general  categories  are  (rather  unimaginatively)  called  type 
I  and  type  II  superconductors.  The  vortex  lines  are  the  topological  defects  for 
the  superconductor.®  Superconductors  are  described  by  a  complex  number  rp  = 
pe'^,  whose  magnitude  |^|  =  p  is  roughly  constant.  The  order  parameter  at  low 
temperatures  is  the  phase  6,  and  thus  the  order  parameter  space  is  a  circle  S^. 
A  vortex  line  must  pass  through  any  loop  around  which  the  phase  of  the  order 
parameter  changes  by  2n.  The  magnetic  field  in  type  II  superconductors  decays 

they  exist,  are  much  more  massive  than  pions,  the  dominant  interaction  for  long  distances  is  pion 
exchange. 
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like  B  =  where  here  r  is  the  distance  to  the  vortex  line.  The  magnetic 

field  is  squeezed  out  of  the  bulk  of  the  material  into  these  defects. 

So,  the  magnetic  field  isn’t  actually  stopped;  it  just  peters  out.  What  kind  of 
a  leaky  shield  is  that?  Actually,  it’s  about  as  good  as  one  can  hope:  after  all,  the 
magnetic  field  won’t  be  able  to  tell  it’s  in  a  superconductor  until  it  gets  inside 
a  bit!  (Atoms  don’t  go  superconducting,  only  huge  heaps  of  atoms  together  do, 
so  the  field  has  to  pass  through  a  heap  or  two  to  realize  that  it  isn’t  wanted.) 
Anyhow,  A  is  usually  pretty  small,  a  few  hundred  Angstroms  or  so.  An  0.1mm 
thin  layer  of  superconducting  paint  naively  would  let  through  a  field  one  part  in 
g- 10000  ^  j^q-4000  qJ  original.  Unfortunately,  it  usually  doesn’t  work  so  well:  a 
few  vortex  lines  get  stuck  on  junk  in  the  paint,  and  let  in  comparatively  large  fields. 

Before  we  can  explain  the  repulsion  of  magnetic  fields,  we  should  explore  the 
broken  symmetry.  Let’s  start  with  superfluids,  which  aire  simpler. 

B.  SUPERFLUID  FREE  ENERGY  AND  SPONTANEOUS  SYMMETRY  BREAKING.  The 

order  parameter  for  a  superfluid,  just  as  for  a  superconductor,  is  a  complex  number 
t/>.  The  free  energy  for  the  superfluid  is  usually  written  asl^l 

^supertuid  =  J  +  olV'p  +  m‘'  ■  (7) 

Above  the  superconducting  transition  temperature  Tg,  the  coefficient  o  >  0.  If  we 
imagine  a  constant  order  parameter  field,  the  free  energy  forms  a  bowl  (Figure  8(a)) 
with  a  minimum  at  zero,  as  a  function  of  the  real  and  imaginary  part  of  ip.  Zero- 
order  parameter  corresponds  to  a  normal  metal  (for  a  superconductor)  or  a  normal 
liquid  (for  a  superfluid). 

Below  Tc,  a  <  0,  and  the  potential  is  at  a  minimum  for  pQ  =  \ip\  =  yjaf20\ 
the  potential  in  the  complex  plane  looks  like  a  Mexican  hat  (Figure  8(b)).  Now 
there  are  many  possible  ground  states:  for  any  9,  a  constauit  order  parameter  field 
Ip  =  poe'^  is  a  ground  state.  Because  the  free  energy  depends  only  on  and 
|VV’|,  it  is  symmetric  to  changing  the  phase  9:  the  superconducting  state  chooses 
a  specific  value  for  9  and,  thus,  spontaneously  breaks  the  symmetry.  The  circle  of 
ground  states  in  the  brim  of  the  Maxican  hat  is  the  order  parameter  space  for  the 
superconductor. 

We  can  write  the  free  energy  in  terms  of  9: 

^:.uperflu.d  =  J  dV\Vpf  +  p'^\V9f  +  ap^  +  V  ■  (8) 


l^lThere  are  two  new  symbols  here;  V  =  (9/8i, d/0y)  and  |xP  =  X*Xi  where  x*  >s  '■he  complex 
conjugate  of  x-  Written  out  in  components. 


/-(S)'(S)  *  (S)'(S) 


You  can  think  of  this  as  a  mathematical  expression  of  the  Mexican  hat  potential  in  Figure  8(b), 
together  with  a  resistance  to  abrupt  changes  in  the  order  parameter. 
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As  we  discussed  in  the  previous  section,  p  is  "massive.  ’  In  Figure  8(b),  if  we 
vary  p  slightly  away  from  pa,  the  energy  increases  quadratically:  a p-  +  (5 p^  —  {ap^  + 
3pq)  «  (a  +  63p^){p  —  pa)'  The  effective  free  energy  for  p  near  pa  is  precisely  of  the 
form  (3)  (except  for  unimportant  constant  shifts),  with  m  =  a  +  6dpa-  Thus  just  as 
before,  p  will  rapidly  be  drawn  to  its  minimum  energy  state  pa.  Because  p  is  massive, 
it  is  basically  constrained  to  stay  at  its  minimum  value.  This  is  why  it  is  ignored  at 
low  temperatures  in  writing  the  order  parameter  field.  Here,  the  constraint  doesn't 
do  anything  interesting:  our  next  constraint  will  be  more  interesting. 

The  6  field  keeps  the  symmetry  of  the  original  model;  rotating  it  to  0 
doesn't  change  the  energy  a  bit.  It  is  a  Goldstone  mode  for  our  problem,  and  long 
wavelength  plane  waves  produce  what  is  known  as  "second  sound’’  in  superfluids. 
Second  sound  turns  out  to  be  heat  waves:  pulses  of  temperature  which  propogate 
like  waves  through  the  superfluid. 


C.  SUPERCONDUCTING  FREE  ENERGY  AND  THE  HIGGS  MECHANISM.  To  describe 
the  expulsion  of  magnetic  field  from  superconductors.  I  have  to  tell  you  how  mag¬ 
netic  fields  interact  with  the  superconducting  order.  I'm  afraid  this  will  be  rather 
sketchy,  and  I  apologize  for  trying. 

First  of  all,  the  particles  which  superconduct  are  pairs  of  electrons.  Electrons 
are  charged,  and  repel  one  another  with  electric  fields.  Thus  the  electrons  interact 
with  electric  fields.  W'e  learn  in  the  second  semester  of  physics  (if  we're  lucky)  that 
electric  and  magnetic  fields  are  closely  related  to  one  another.  (This  was  discovered 
by  Einstein:  a  moving  electric  E  field  develops  a  magnetic  B  component.) 

Now,  the  E  and  B  fields  can  be  written  at  the  same  time  in  terms  of  another 
field  .4.  It  is  this  new  field  which  is  easiest  to  work  with.  In  particular. 


B  =  curl  .4  = 


/  dA . 


dAy  dAj. 

~W'  ~dr 


dA;  dAy 

dx  dx 


c^\ 
dy  / 


(9) 


The  magnetic  energy  is  ^magnetic  /  dV  B~. 

Now,  you  remember  that  I  mentioned  earlier  that  light  (the  photon)  is  massless? 
You  may  know  that  light  is  sometimes  called  "electromagnetic  radiation.”  The 
“order  parameter  field”  for  light  is  precisely  the  .4  field.  We  can  see  by  expanding 
B~  in  terms  of  ,4  that  the  energy  for  the  A  field 

Wet.c=  j  +  ■■■  (10) 

doesn’t  have  any  terms  like  ,4^.  When  we  add  the  energy  from  the  electric  fields, 
this  is  still  true:  light  is  massless  because  the  electromagnetic  energy  involves  only 
derivatives  of  A. 

Now,  I  need  to  know  how  the  electromagnetic  order  parameter  A  interacts 
with  the  superconducting  order  parameter  i>.  I’ll  just  tell  you.  The  free  energy  for 
a  superconductor  looks  like 

^superconductor  =  J  dV \V Ip  -  iAiJ;\^  +  a\rp\- +  3\rp\'' +  B^  .  (11) 
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If  we  set  ^  =  0,  we  get  the  magnetic  energy  for  the  A  field.  If  we  set  y4  =  0, 

we  get  the  superfluid  energy  (8).  I  don't  know  of  a  way  to  motivate  the  way  in 
which  we  couple  the  A  field  to  the  gradient  VV'-  I  don’t  think  anyone  has  a  simple 
derivation.  This  way  of  connecting  the  two  is  called  “minimal  coupling,”  which  just 
gives  a  name  to  the  unexplained  fact  that  the  simplest  way  of  coupling  the  two 
gives  the  right  answer. 

Now,  if  we  assume  T  <  T^,  so  o  <  0  and  p  ~  we  find 

■-  (12) 

We  want  to  know  if  .4  or  0  is  going  to  develop  a  mass.  The  problem  is:  ^^superconductor 
doesn’t  look  quite  like  the  form  (3)  for  either  one.  If  we  combine  the  two  into  a 
new  order  parameter  field  C  —  —  .4.  and  use  the  fact  that  the  second  partial 

derivative  d^O/dzdy  =  d^O/dydz,  we  see  that 


(!3) 

Thus  the  new,  combined  field  C  is  massive.  C  will  be  constrained  to  zero  in  the 
bulk,  exponentially  decaying  like  Coe"'’”''.  The  magnetic  field  B  =  curlC  thus  also 
decays,  and  the  penetration  depth  A  =  l//?o- 

We  started  with  a  massless  photon  field  A  and  a  massless  Goldstone  mode 
9.  We  ended  up  with  only  one  field  C,  with  a  mass.  Did  we  lose  something?  No, 
actually  C  has  three  components:  two  components  corresponding  to  the  original 
two  polarizations  of  light  and  one  component  corresponding  to  the  Goldstone  mode. 
Coleman”  says  “the  Goldstone  bcson'eats  the  photon,  and  gains  a  mass!” 

The  Weinberg-Salaam  theory  of  the  weak  interaction  is  exactly  analogous  to  the 
theory  of  superconductivity.  The  role  of  lead  or  niobium  is  played  by  the  vacuum. 
The  free  energy  of  the  universe  has  an  517(3)  symmetry,  which  is  spontaneously 
broken  to  a  smaller  symmetry  SU{2)  x  1/(1).  The  and  Z  bosons  which  now 
mediate  the  weak  interaction  used  to  be  massless:  they  and  the  photon  were  all  part 
of  one  big  A  field.  If  current  theories  of  cosmology  are  true,  this  “superconducting  ’ 
transition  occurred  in  the  first  instants  after  the  Big  Bang. 

Now,  after  explaining  superconductors,  the  weak  interaction,  and  the  phase 
transition  in  the  early  universe,  let’s  return  to  why  we  don’t  understand  crystals. 
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FIGURE  9  Polycrystal.  Many  crystalline  materials,  such  as  metals,  normally 
aren’t  made  of  a  single  crystal.  They  are  formed  from  many  crystalline  domains:  a 
polycrystalline  configuration.  I  show  a  schematic  of  a  polycrystal  here.  The  important 
thing  to  notice  is  that  the  atoms  within  a  domain  are  almost  undeformed  except  right 
next  to  the  domain  wall.  All  the  rotational  deformation  is  expelled  into  sharp  domain 
boundaries. 


IV.  THE  MYSTERY  OF  THE  CRYSTALS 

Normally,  when  you  think  of  crystals,  you  think  of  diamonds,  snowflakes,  or  maybe 
salt  crystals. I®1  These  are  single  crystals:  the  sodium  and  chlorine  atoms  in  a  grain 
of  salt  sit  in  registry  all  the  way  across  the  grain,  giving  it  its  cubical  shape.  Did 
you  know  that  metals  form  crystals?  In  the  last  paper,^  I  mentioned  dislocation 
lines  in  a  copper  crystal.  Metals  don’t  have  big  facets  and  corners  because  they 


1^1 Some  of  you  will  think  of  wine  glasses.  They  are  made  of  glass  and  aren't  crystals  at  all. 
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are  polycrystalline.  The  atoms  in  a  metaJ  also  sit  on  a  regular  lattice,  but  the 
metal  breaks  up  into  domains  in  which  the  lattices  sit  at  various  angles  (F'igure  9). 
Because  there  are  lots  of  small  domains,  copper  doesn  t  form  facets  like  salt  grains 
and  snowflakes  do. Id 

What  Ming  Huang  (one  of  my  students^)  and  I  have  been  trying  to  explain  for 
years  is  why  those  little  domains  form.  It’s  easy  to  see  that  different  regions  might 
grow  with  different  orientations  (Figure  10).  When  they  touch,  the  different  domains 
will  start  pushing  cind  twisting  one  another,  trying  to  make  one  big  domain.  It  isn't 
hard  to  believe  that  they  will  stop  growing  after  a  while,  fighting  one  another  to  a 
standstill.  What  we’ve  been  trying  to  understand,  though,  is  why  the  final  state  is 
made  of  perfect  little  crystals  separated  by  sharp  domain  walls. 


o 

o 


FIGURE  1 0  Growing  a  Crystal  from  a  Liquid:  Forming  a  Polycrystal.  Polycryslals 
can  form  for  lots  of  reasons.  If  one  cools  a  liquid  quickly,  one  can  find  that  crystalline 
regions  can  form  in  many  different  places  almost  simultaneously.  Since  they  will 
have  random  orientations,  they  won’t  match  up  when  they  meet.  When  they  do 
meet,  rearrangements  of  atoms  will  occur  to  try  to  realign  and  merge  the  domains 
(coarsening).  As  we  continue  to  cool  and  wait,  this  process  will  eventually  stop,  leaving 
us  with  different  domains. 


crystals  are  sometimes  found  in  nature.  The  growth  takes  place  so  slowly  that  a  single 
crystal  can  form.  The  same  idea  happens  with  rock  candy:  you  get  a  glass  if  you  cool  sugar  s>Tup 
quickly,  but  if  you  evaporate  a  sugar  solution  slowly,  you  can  get  big  crystals. 
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Now,  I  don’t  want  to  exaggerate.  There  eu’e  perfectly  good  explanations  for 
why  crystals  form  domain  walls.  They  just  aren’t  as  beautiful  and  general  as  they 
might  be.  They  don’t  fit  in  with  the  general  ideas  of  broken  symmetries  and  order 
parameters:  they  apply  only  to  crystals.  Our  explanation  for  why  superconducters 
don't  have  a  Goldstone  mode  was  perfectly  okay  before  Higgs  came,  too.  He  made 
it  beautiful  and  generalized  it  to  explain  something  completely  different.  Ming  and 
I  want  to  understand  grain  boundaries  in  a  way  which  will  make  simple  and  clear 
where  else  similar  phenomena  might  occur.  At  least,  we'd  like  to  understand  why 
focal  conics  occur  at  the  same  time.  Domains  formed  by  breaking  translational 
symmetry  in  one  direction  and  in  three  directions  should  have  the  same  kind  of 
explanation! 

Figure  11  shows  a  domain  wall  in  a  crystal.  The  crystalline  ground  state  ro¬ 
tates  as  one  crosses  the  domain  wail.  The  atoms  at  the  wall  are  quite  unhappy.  You'd 


FIGURE  11  Domain  Wall.  Here  we  see  a  single  domain  wall.  Notice  that  the  domain 
wall  can  be  also  thought  of  as  a  series  of  dislocations.  The  strain  field  inside  the  crystal 
due  to  a  line  of  dislocations  can  be  shown  to  decay  exponentially,  just  as  the  magnetic 
field  dies  away  around  a  vortex  line. 
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think  that  they  would  push  and  pull  on  their  neighbors,  and  that  there  would  be 
strains  leaking  far  into  the  crystal.  This  isn't  true.  In  fact,  there  is  a  well-known 
rule  in  the  materials  science  literature  that  the  strain  field  from  a  domain  wall  dies 
away  exponentially  as  one  enters  the  grain. 

Doesn’t  that  sound  like  a  Meissner  effect? 

There  are  more  analogies.  Crystals  break  both  the  translational  and  the  rota¬ 
tional  symmetry  of  liquids.  Many  liquid  crystals  only  break  the  rotational  symmetry. 
They  have  Goldstone  rotational  waves;  if  you  rotate  a  large  region  inside  a  liquid 
crystal,  it  will  cost  little  energy  and  will  slowly  rotate  back.  When  the  translational 
symmetry  is  also  broken,  the  rotational  Goldstone  mode  disappears!  If  I  rotate 


FIGURE  12  (a)  Rotational  distortion  of  a  crystal.  If  we  take  a  thick  piece  of  metal  and 
rotate  one  end  with  respect  to  another,  it  will  start  by  bending  uniformly.  As  it  continues 
to  bend,  dislocations  will  form  to  ease  the  bending  strain.  These  line  dislocations  will 
start  off  distributed  irregularly  through  the  sample,  (b)  Domain  walls  form  to  expel 
rotations.  If  we  hold  the  rotation  for  a  long  time  and  let  the  dislocations  move  around, 
they  will  lower  their  energy  by  arranging  themselves  into  domain  walls.  Between  the 
domain  walls  we  find  undistorted  crystal.  This  process  is  called  polygonalization. 
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one  piece  of  a  crystal  with  respect  to  another,  it  costs  an  enormous  energy  (Fig¬ 
ure  12(a)).  If  I  let  the  distorted  crystal  rearrange  locally  to  reach  equilibrium,  the 
rotational  deformation  is  expelled  into  grain  boundaries  (Figure  12(b)),  a  process 
known  in  the  field  as  polygonalization.  Just  like  the  massless  photon  developed  a 
mass  when  the  superconducting  transition  broke  the  gauge  symmetry,  the  massless 
rotational  mode  develops  a  mass  when  the  translational  symmetry  is  broken. 

This  is  surely  also  related  to  some  of  the  old  problems  in  the  topological  the¬ 
ory  of  defects.  In  describing  a  crystal,  everybody  uses  the  displacement  field  u(x) 
and  its  derivatives.  Now.  as  we  saw  in  the  last  paper, u(x)  describes  the  broken 
translational  order,  but  not  the  broken  orientational  order.  Why  don’t  we  also  have 
a  rotation  matrix  R{x)?  For  example,  in  Figure  11,  R{x)  shifts  abruptly  from  one 
side  of  the  domain  wall  to  the  other.  Mermin®  discusses  some  of  the  weird  behavior 
one  gets  following  this  path.  The  point  is  that  R(x)  seems  to  be  constrained:  it 
doesn't  change  on  iis  own  but  follows  the  broken  translational  order.  Keeping  it 
as  an  order  parameter  seems  no  more  necessary  than  keeping  p  =  |t/>|  around  in  a 
superconductor:  only  d  is  massless,  and  p  just  wiggles  around  po  in  a  boring  way. 

Now,  Ming  and  I  have  spent  a  huge  amount  of  time  trying  to  make  these  words 
into  a  mathematical  theory.  (We  started  with  smectics,  then  studied  superconduc¬ 
tors,  then  thought  about  some  ideas  of  Toner  and  Nelson,. . ..)  Ming  has  gone  on 
to  better  things,  and  I’m  .still  futzing  with  it.  I  can  summarize  where  we  are  right 
now.  Suppose  we  consider  a  rotationally  distorted  two-dimensional  crystal  (Fig¬ 
ure  12(a)).  We  can  define  a  rotational  order  parameter  by  looking  at  the  angle  of 
the  nearest-neighbor  bonds: 

/  COS0  sin0\  .... 

a  1  ■ 

\  —  sin  p  cosy  J 

The  translational  order  parameter  u  is  just  as  it  always  was:  if  x  is  the  original 
position  and  ^i)  is  the  corresponding  position  in  the  ideal  lattice, 

u{x)  =  p{x)  —  X  .  (15) 

Now,  the  free  energy  can  only  depend  on  gradients  of  u,  since  it  is  translationally 
invariant.  It  also  cannot  change  if  we  perform  a  uniform  rotation:  R  — ^  RqR,  p 
Rop.  From  this,  we  can  see  that  the  free  energy  must  be  written  in  terms  of  gradients 
of  R{x)  and  the  particular  combination!®) 


A  reasonable  free  energy  for  a  crystal  then  becomes 

=  (vef  +  2^^  A  J:c?,  +  .  (17) 

jy  \  /  ,•  \  / 


l®lT>ii«i  is  analogous  to  thf  niiniikial  coupling  term  —  >1  in  the  free  energy  for  a  superconductor. 
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This  is  just  the  normal  elastic  energy  everybody  uses,  except  for  the  third  term 
multiplied  by  k.  Normally,  the  strain  matrix  e  is  defined  to  be  symmetric,  so  this 
term  is  then  zero. 

Our  free  energy  doesn’t  keep  e  automatically  symmetric  precisely  because  we 
have  R{x)  as  an  independent  degree  of  freedom.  The  antisymmetric  part  measures 
the  amount  that  R  disagrees  with  the  local  gradients  of  u.  It  turns  out  that  this 
antisymmetric  part  for  the  crystalline  free  energy  is  anailogous  to  the  current  for 
the  superconductor,  which  has  a  Meissner  effect .1®! 

There  are  several  things  I  haven’t  been  able  to  do,  though.  First,  1  don’t  think 
Ci2  —  C21  is  expelled  quite  like  its  analogue  in  the  superconductor.  I  think  we  can 
show,  though,  that  it  is  a  boring  variable  like  p  was.  Second,  I  haven’t  a  clue  on 
how  to  show  that  grains  exist.  To  show  that  grains  exist  I  have  to  show  a  constraint 
like  V0  =  0! 

We  started  this  paper  by  admiring  the  focal  conic  defects  in  smectic  liquid 
crystals:  beautiful  ellipses  and  hyperbolas  which  are  due  not  to  topology  but  to 
geometrical  consequences  of  a  constraint.  We  saw  how  constraints  can  be  enforced 
by  the  energy;  “massive”  modes  decay  exponentially.  We  saw  explicitly  how  this 
occurs  in  superconductors — the  magnetic  field  is  constrained  to  zero  because  the 
photon  and  the  Goldstone  boson  for  the  superconducting  gauge  symmetry  combine 
into  a  massive  particler-HFinally,  we  discussed  analogous  effects  in  the  everyday 
problem  of  grain  boundaries  in  crystals,  and  realized  that  we  don’t  really  understand 
them  in  a  deep  sense. 


APPENDIX:  THE  SMECTIC  ORDER  PARAMETER 

Here  we  derive  the  consequences  for  layered  systems  of  the  constraint  that  the  layers 
be  equally  spaced.  Suppose  that  there  are  a  stack  of  (bent)  sheets,  equally  spaced 
from  one  to  the  next,  with  separation  a.  Suppose  that  the  unit  normal  to  these 
sheets  at  a  position  x  is  given  by  h.  Consider  traveling  around  a  loop  C,  crossing 
various  layers  as  we  go  around  (Figure  13).  The  number  of  layers  we  cross  is  given 
by  the  line  integral 

=  net  #  crossed  .  (18) 

If  the  layers  exist  throughout  the  region  without  any  defects,  then  the  net  number 
crossed  around  any  closed  loop  must  be  zero.  Using  Stokes’  theorem,  this  integral 
over  C  is  equal  to  an  integral  over  the  area  A  swept  out  by  the  curve: 

(19) 


curl  h  ■  dA. 


l®llt  is  the  gradient  of  ^cry^tsl  with  respect  to  6,  just  as  the  current  is  the  gradient  of  ^superconductor 
with  respect  to  .d.  I  thank  Alan  Luther  for  pointing  this  out. 
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FIGURE  13  Equally  spaced  layers  imply  curl  n  =  0.  Smectic  layers,  with  a  loop 
C  enclosing  an  area  A.  The  dot  product  n  •  dt  gives  the  cosine  of  the  angle  of  the 
curve  C  with  respect  to  the  layers,  and  a/  cos  6  is  the  length  of  curve  C  between  two 
layers,  so  1/a  J^h  di  gives  the  net  number  of  layers  crossed  by  the  curve  C.  (A  layer 
crossed  first  forward  and  then  backward  cancels,  of  course).  Since  in  a  closed  loop  the 
net  number  of  layers  crossed  must  be  zero  (assuming  no  dislocations),  this  must  be 
zero.  By  Stokes’  theorem,  •  di  =  curl  n  ■  dA.  This  is  true  for  any  little  area  A, 
so  curl  n  =  0. 


But  for  this  to  be  true  for  all  areas  A,  curl  h  must  be  zero. 

Now,  we  already  know  that  =  1.  The  derivative  dv? fdx a,  of  course,  must 
be  zero,  so  using  the  product  rule 


Now,  since  we  know  curl  fi  =  0,  we  know  from  Eq.  (9)  that 

dhp  _  dhg 


(21) 
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Finally,  combining  these,  we  find 

=  (n  •  V)n  =  0  .  (22) 

0X0 

This  implies  that  n  doesn’t  change  when  you  move  in  the  n  direction.  This  means 
that  h  will  be  perpendicular  to  the  next  layer  as  well:  that  is,  a  straight  line  per¬ 
pendicular  to  one  layer  will  be  perpendicular  to  every  layer  it  crosses. 

These  perpendicular  lines  are  called  generators.  We  quaJitatively  knew  already 
that  one  layer  determined  its  surroundings;  now  we  have  a  simple  geometrical  rule 
describing  this  nonlocal  constraint.  For  your  information,  the  defects  occur  where 
the  generators  cross  (as  shown  in  Figure  3);  this  surface  is  called  the  evolute,  or 
surface  of  centers,  for  the  layer. 
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Fractal  Time  Dynamics: 

From  Glasses  to  Turbulence 


One  major  theme  in  physics  has  been  to  find  the  right  scale  for  a  problem. 
How  high  can  the  highest  mountains  be  on  Earth  and  Mars?  What  is  the  size  of  a 
hydrogen  atom?  What  is  the  density  of  the  atmosphere  at  35,000  feet?  These  types 
of  problems  represent  great  successes  in  physics,  in  part  because  the  predictions 
can  be  tested,  and  agreement  leads  to  confidence  in  understanding  basic  physics. 
These  types  of  problems  are  prevalent  in  ihe  teaching  of  physics. 

Another,  newer  theme  is  to  investigate  problems  where  many  scales  enter,  but 
no  scale  dominates,  i.e.,  there  is  no  characteristic  size.  The  struggles  and  successes 
of  meeting  the  challenge  of  scale  invariance  in  the  field  of  phase  transitions  are  well 
known.  The  scaling  enters  through  the  divergence  of  a  correlation  length  which  oc¬ 
curs  at  a  finite  temperature.  Clustering  of  correlations  (spin  clusters,  lattice  gas  par¬ 
ticle  clusters,  etc.)  occurs,  and  one  cluster  percolates  through  the  material  when  the 
correlation  length  diverges.  The  modern  methods  for  approaching  these  problems 
is  through  the  renormalization  group  which  also  allows  the  (usually  non-integer) 
critical  exponents  to  be  calculated  from  fixed  points  of  trauisformation  equations 
which  relate  different  scales  to  each  other.  Mandelbrot,®  throughout  his  long  ca¬ 
reer,  has  shown  that  a  fractal  geometry  can  underlie  the  appearance  of  non-integer 
exponents  from  describing  the  coastline  of  England  to  the  2.5  dimensions  of  a  3-D 
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percolation  cluster.  The  idea  of  a  fractal  dimension  (or  physics  in  non-integer  di¬ 
mensions)  has  become  commonplace  in  many  fields  of  science  including  physical, 
engineering,  biological,  social,  2uid  economic. 

In  these  lectures,  the  struggles  and  successes  of  applying  scaling  ideas  to  trans¬ 
port  and  relaxation  in  amorphous  materials  will  be  presented.  The  first  section  will 
rely  heavily  on  the  notion  of  fractal  time,  our  first  topic  for  discussion.  Fractal-time 
random  walks  will  prove  useful  for  understanding  scaling  properties  of  glassy  ma¬ 
terials.  Next,  we  will  turn  our  attention  to  random  walks  with  fractal  trajectories, 
called  Levy  flights.  When  Kolmogorov  space-time  scaling  is  appUed  to  these  Levy 
flights  a  novel  view  of  turbulent  diffusion  arises. 


1.  FRACTAL 

Let  V'(f)  be  the  probability  density  that  the  duration  between  events  is  t.  This 
might  represent  the  time  it  takes  for  a  particle  to  jump  out  of  a  trap.  A  simple 
choice  for  ip(t)  is  the  exponential,  ip{t)  =  exp(  — Af).  This  density  has  a  well-defined 
mean  time  (t)  =  1/A.  Let  us  investigate  tp{s),  the  Fourier  transform  of  ^(f).  We 
will  be  interested  in  asymptotically  long  times  (or  equivalently  small  s)  behavior. 


exp{—st)tp{t)dt  =  1  —  s{t)  +  O(s^). 


(1) 


The  appearance  of  s  to  the  first  power  is,  in  some  sense,  in  agreement  with  the  notion 
that  time  is  one  dimensional,  and  that  it  flows  smoothly  forward.  The  general  result 
of  Eq.  (1)  is  certainly  true  for  the  case  for  xp{t)  =  Aexp(  — Af),  where 


V'(s) 


A 

A  -J-  s 


What  would  happen  if  {t)  were  infinite?  If  xp{t)  ~  (0  <  /?  <  1)  as  f  — ♦  oo, 

then  V'(f)  is  normalizable,  but  its  first  moment  diverges.  What  would  ip{t)  look  like 
in  this  case?  Certainly,  the  expansion  in  integer  powers  of  s  will  not  apply  when  (<) 
is  infinite.  Let  us  consider  a  ^>(0  which  is  a  sum  of  exponentials  where  scales  of  all 
orders  of  magnitude  enter. 


xp{t)  =  X’ exp{-XH),  (A  <  AT  <  1). 


N 


(2) 


i  =  i 


The  mean  value  of  V’(f)  is  given  by 

(0  = 


;  =  1  ^  ^ 


OO. 


(3) 
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To  make  further  progress  in  analyzing  w{t),  let  us  look  at  its  Fourier  transform 

Hs), 

expi- sl)iit)dt  = 

Note  that  V’(s)  satisfies  a  scaling  relation, 


il>(s)  =  i\'lp 


[1  - 
(s  +  A) 


(5) 


whose  homogeneous  part  tp{s)  =  Ntl){s/X)  has  a  solution  of  the  form, 

In  AT 


il'(s)  =  s’^ ,  with  (3  = 


In  A 


(6) 


This  hints  that  an  expansion  of  V’(s)  for  small  s  will  involve  non-integer  powers  of 
s.  An  exact  analysis  can  be  performed  by  substituting,  for  l/[s-|-A'?]  in  Eq.  (4),  its 
own  inverse  Mellin  transform  to  find^^ 


27riip{s) 


1.-  N 
N 


TT 

sin(irf) 


(7) 


(0  <  c  =  Re  e  <  1).  Interchanging  the  sum  and  integral  yields 


27ri^(s) 


1  -  N  S-^TTiVA^ 

^  Jc-,^  [sin(7re){l  -  jVA^}] 


(8) 


The  integrand  has  simple  poles  from  the  sin(7rf)  term  at  f  =  0,  ±1,  ±2, . . and  from 
the  factor  in  the  denominator  when  c  =  —  In  A^/  In  A  ±  27rzj/  In  A  {j  =  0, 1, 2, . . .). 
Translating  the  contour  of  Re  c  =  — oo  and  taking  account  of  the  poles  crossed,  we 
find  for  e  <  1  that 


t/)(s)  =1-1-  s^K{s)  + 


1  -  N 
N 


E 


j=i 


i-iys^Nxj 

[Xi  -  n] 


with  (3  given  by  Eq.  (6),  and  where  K{s)  is  periodic  in  Ins  with  period  In  A,  as 
given  by 

...  .  \-N  7rA^A®exp(-2xi>lnsln  A)  . 

E  - -  <»> 

— OO 

where  x  =  —  In  iV/  In  A  -f  2vij/  In  A. 

While  the  above  has  been  somewhat  technical,  it  should  assure  the  reader  that 
expansions  in  non-integer  exponents  are  legitimate  and  arise  in  probability  the¬ 
ory  for  transforms  of  temporal  probability  distributions  which  possess  infinite  first 
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moments.  We  will  encounter  similar  examples  for  random  walk  jump  distributions 
whose  second  moments  are  infinite. 

The  term,  somehow,  implies  that  time  is  a  /^-dimensional  quantity  with 
/?  <  1.  When  (<)  is  finite,  this  implies  0  =  1.  In  this  sense  unity  is  the  upper  critical 
dimension  of  time.  A  simulation  of  the  above  temporal  process  would  find  waiting 
time  durations  of  etc.,  with  waits  an  order  of  magnitude  longer 

(in  base  A)  occurring  an  order  of  magnitude  less  often  (in  base  N).  The  similarity 
definition  of  a  fractal  dimension  is  the  log  (#  subclusters/cluster)/  log  (scale  factor) 
For  our  fractal-time  waiting  process,  the  jumps  occur  in  self-similar  clusters  with 
about  N  jumps,  each  separated  in  time  by  1/A  occurring  before  a  wait  of  1/A- 
occurs.  Then  another  cluster  of  about  N  closely  spaced  (in  time)  jumps  occurs 
before  another  wait  of  1/A^  occurs.  After  about  N  of  these  clusters  of  jumps  occurs, 
a  wait  of  1/A^  arises,  and  so  on,  and  so  on  in  a  hierarchical  fashion.  We  find  self¬ 
similar  clusters  of  jumps  with  about  N  subclusters/cluster  and  differing  in  time 
duration  between  jumps  by  about  a  factor  of  A“^  This  is  in  accord  with  treating 
P  as  the  fractal  dimension  of  the  process.  The  jumps  do  not  occur  at  a  nice  well- 
defined  rate,  but  in  a  very  patchy  manner.  If  one  made  marks  on  a  time  axis  when 
jumps  occur,  then  the  set  of  points  would  look  like  the  points  in  a  random  Cantor 
set  with  fractal  point  set  dimension  0.  The  number  of  points  M{T)  on  a  line  of 
length  T,  versus  M{T/\^  the  number  of  points  on  a  line  of  length  T/A,  gives  the 
relation  M{T)  =  NM{T/X).  This  equation  has  a  solution  in  the  form  M{T)  = 
with  /?  =  In  N/Xn  A. 


2.  STRETCHED  EXPONENTIAL  RELAXATION  (A  FRACTAL 
TIME-INITIATED  PROCESS)^^ 

bet  us  consider  a  model  of  relaxation  for  a  glassy  material.  Suppose  a  glassy  material 
supports  the  motion  of  defects.  The  defects  are  viewed  as  encapsulating  free  volume 
which  when  transported  to  a  frozen-in  region  of  the  glass  can  cause  a  relaxation  to 
occur.  The  frozen-in  part  of  the  material  may  be  a  dipole  (so  dielectric  relaxation 
occurs)  or  a  polymer  chain  segment  (so  mechanical  releocation  occurs),  etc.  Let  us 
assume  that  there  are  N  mobile  defects  in  the  material  which  can  move  between 
Ksites.  Assume  there  is  a  frozen-in  region  at  the  origin  of  our  coordinate  system,  <md 
that  the  N  defects  are  randomly  pl8u:ed  in  the  material.  Let  0(f)  be  the  probability 
that  no  defect  has  reached  the  origin  by  time  t.  We  formally  write  0(<)  as 


m  = 


(10) 


where  F(r,  r)  is  the  probability  density  for  a  random  walker  starting  at  site  r,  to 
reach  the  origin  at  time  r.  F  is  called  a  first  passage  time  probability.  The  integral 
allows  for  a  first  passage  of  a  walker  to  the  origin  in  the  time  interval  (0,f).  This 
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encounters  will  cause  the  relaxation  to  occur.  Note  we  sum  over  all  initial  starting 
points  for  each  walker,  and  the  probability  of  starting  from  any  site  is  l/K  Thus, 
the  term  within  the  brackets  is  the  probability  that  a  pcirticular  random  walker  did 
not  reach  the  origin  by  t'me  t.  VVe  raise  the  bracket  to  the  A^th  power  to  account 
for  none  of  the  N  random  walkers  reaching  the  origin  by  time  t.  Let  us  take  a 
thermodynamic  limit  o{  N  —  oo,  V  — >•  oo,  but  c  =  N /V  remaining  constant.  In 
this  limit  0(t)  becomes 


0{t)  —  exp 


F(r,  r)dr 


(11) 


The  argument  in  the  brackets  is  minus  the  flux  of  random  walkers  into  the  origin  at 
time  t.  This  is  the  type  of  expression  Smoluchkowski  would  have  written  to  describe 
this  reaction  scheme.  We  can  •  .write  Eq.  (11)  by  noting  that  auiy  of  the  sites  from 
which  a  walker  can  reach  the  origin  within  a  time  t  are,  by  symmetry,  the  same 
set  of  sites  a  walker  starting  at  the  origin  can  reach  by  time  t.  We  can  now  write 
Eq.  (11)  as 

0(0  =  exp[-S(0]  (12) 

where  S(t)  is  the  distinct  number  of  sites  a  random  walker  starting  at  the  origin 
visits  within  a  time  t.  The' walker  may  make  100  jumps,  but  only  visit  28  different 
sites.  5(0  would  then  be  28,  and  walkers  starting  from  these  28  sites  could  reach 
the  origin  within  time  t.  Let  us  now  take  an  interlude  into  the  theory  of  random 
walks  to  learn  how  to  calculate  5(0- 


MONTROLL-WEISS  CONTINUOUS-TIME  RANDOM  WALKS'^ 

Let  us  first  consider  the  statistics  of  n-step  random  walks.  We  will  not  yet  focus  on 
the  statistics  of  random  time  interval  occurring  between  jumps.  Let’s  begin  with  an 
equation  for  P„+i(r)  the  probability  that  a  random  walker  beginning  at  the  origin 
reaches  site  r  on  its  (n  +  l)st  step.  We  can  write  this  probability  in  terms  of  the 
probability  p(r)  that  a  single  step  has  a  displacement  r,  via 

Pn+Ur)  =  Y,P„{r-r')p{r').  (13) 

r' 

It  is  useful  to  introduce  a  generating  function  G(r;z)  defined  as 

OO 

G{r-z)  =  J2Pr,(r)zr  (14) 

n=0 

Multiplying  both  sides  of  Eq.  (14)  by  and  summing  over  all  n,  let  us  rewrite 

Eq.  (13)  as  a  Green’s  function  equation 

G(r;  z)  -  z  ^  G(r  -  r';  z)p{r')  =  6r,o. 

r' 


(15) 
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We  can  also  write  Pn(r)  as 


P„(r)=  f^r.-m{r)Pm{0)  +  S„.6r,O,  (16^ 

m=0 

where  F„(r)  is  the  probability  that  on  the  nth  step  the  walker  reaches  site  r  for  the 
first  time.  This  calculation  allows  for  the  walker  reaching  site  r  for  the  first  time 
after  n  —  rn  steps  and  then  returning  in  m  steps  (i.e..  zero  displacement  in  tfif^  last 
m  steps).  Let  us  now  define  the  generating  function 

CC 

F(r;r)  =  (17) 

n=:0 


Multiplying  Eq.  (16)  by  2”  and  summing  over  n  (and  taking  advantage  of  the  con¬ 
volution  form  of  the  equation)  leads  to  the  following  generating  function  equation. 


F{r;z)  = 


rG(r;2)-M 

G(r  =  0;2) 


(18) 


The  number  of  distinct  sites,  5„,  visited  in  an  n-step  random  walk  is  closely  related 
to  the  first  passage  time-probabilities  by 


5„  =  l  +  ^[f'ur)  +  ...  +  F„(r)]. 

r 

Forming  a  generating  function  for  5„,  we  find 

CO 

n=0 

=  +  z  Fi(r)  -t-  2-  ^[r i(r)  +  F2(r)] 

+  ...  +  2”^[Fi(r-)  +  ...  +  F„(r)]  +  ... 

=  j  ^  Y3T  X^t-Fi('’)  -f-  . . .  -f  2"F„(r)  +  . .  .] 


(19) 


(20) 


(1  -  2)2  G(r  =  0;  2) 


The  next  stage  of  complication  is  to  introduce  the  waiting  time  density  rp{t)  gov¬ 
erning  the  duration  between  steps.  Let  i^„(<)  be  the  probability  density  that  the 
/ith  jump  occurs  at  time  t.  This  can  be  written  in  terms  of  tp{t)  as 
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Taking  Laplace  transforms,  we  find  t’n(s)  =  The  probability  density  to 

reach  site  r  for  the  first  time  at  time  t  is  given  by 


"30 

F(r,  0  =  ^  /•'„(r)t'„(/). 
n=0 


Its  Laplace  space  representation  has  the  form  of  a  generating  function 

F{r.s)-  exp(-.s<)F(r,  Odt  =  ^  F„(r)[c’(s)]",  (22) 


n  =  0 


VVe  can  now  see  that  by  replacing  r  by  i''(s)  in  Eq.  (20)  that  we  have  an  expression 
for  the  generating  function  for  visiting  n  distinct  sites  when  the  nth  jump  occurs 
at  time  t.  We  need  just  one  more  adjustment  to  take  account  of  the  nth  jump 
occurring  at  time  t  —  r  and  no  jump  occurring  in  the  remaining  time.  i.e..  n  jumps 
occur  but  the  nth  jump  occurs  before  time  t.  We  write  S{t)  as 


•■5(0  =  /  l/’n(<  -  T)W(T)dT 

-  n=0 


(23) 


where  W(t)  =  t,’(u)du  is  the  probability  that  the  time  between  jumps  is  longer 
than  r.  The  Laplace  transform  of  lV{t)  is  [1  —  t’(s)]/s.  Using  Eq.  (20),  we  write 
S(t),  in  Laplace  space.  ;is 


S{z  -  t^’(s))(l  -  t/’ls)) 
s 

_ t/df) _ 

[s{i  -  U’(s))]G{0ais))' 


(24) 


In  three  dimensions,  at  long  times  (small  s)  ^7(0;;)  ~  const.  +  0(1  —  ::). 

For  a  continuous-time  random  process  where  (<)  is  finite,  then  il'(s)  ~  1  —  s{t), 
and  5(.s)  ~  l/s*.  so  S{t)  ~  t.  The  fractal  time  case  of  (t)  infinite  is  more  interesting. 
There  we  have  t.  (.s)  ~  1  —  .s*^.  with  3  <  1,  so  S{t)  ~  /^. 


BACK  TO  THE  RELAXATION  LAW 


Let  us  now  substitute  our  calculation  for  5(0  back  into  the  relaocation  law  equation 

(12). 


0(0 


r  exp(— cO  (0  finite; 

^exp(— c<^)  (0  infinite. 


(25) 


We  arrive  at  simple  exponential  decay  if  a  time  scale  exists,  and  stretched  exponen¬ 
tial  decay  if  it  doesn’t.  Both  cases  are  examples  of  probability  limit  distributions. 
This  can  explain  the  ubiquity  of  the  stretched  exponential  law  for  glassy  materials. 
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3.  DIVERGENCE  OF  THE  TIME  SCALE^ 

If  we  rewrite  <^(<)  =  exp(— as 


<f>{t)  =  exp 


then 


It  is  known  that  for  many  glassy  materials  an  empirical  fit  to  data  yields 


T  =  To  exp 


const.  T 
T-To  . 


(27) 


with  To  and  Tq  being  constants.  This  is  known  as  the  Vogel- Fulcher-Tammann  law. 
In  our  model,  our  defects  can  cluster  as  the  temperature  is  lowered  to  lower  the 
entropy  of  the  system.  This  clustering  would  be  due  to  an  attractive  interaction 
between  the  defects.  We  assume  a  lattice  gas  model  of  defects  versus  non-defect 
sites.  In  a  mean  field  model,  the  correlation  length  diverges  with  temperature  as 
^ •=  1/(7’—  7o)^^^.  Let  us  assume  that  singlet  defects  are  more  mobile  than  doublet, 
triplet,. . .  clusters,  so  we  will  focus  on  the  temperature-dependent  population  of 
the  singlets  which  we  denote  by  ci .  The  probability  a  defect  is  at  a  site  and  that  it 
is  not  correlated  with  other  defects  is  given  by 


Cl  =  c(l  - 


(28) 


where  ^  is  the  correlation  length,  and  Vf  =  is  the  correlation  volume.  For  our 
lattice  gas  model  of  the  defects  ci  ~  exp(— cV^)  ~  exp(— c/[T  —  To]^/^),  and  thus 
using  Eq.  (26)  we  find 

r~exp([T-To]-^).  (29) 

While  this  differs  from  the  Vogel  by  having  an  exponent  of  3/2  versus  1,  it  has 
proven  in  several  comparison  to  provide  the  better  fit.^’^ 

If  the  defects  do  not  cluster,  then  one  can  still  obtain  the  stretched  exponen- 
tiaJ  law,  but  none  need  not  obtain  a  Vogel-type  law.  This  is  the  case  for  Si02.  4> 
is  a  stretched  exponential,  but  r  is  Arrhenius.  This  is  consistent  with  our  model 
as  the  law  for  <i>{t)  focuses  on  how  a  defect  moves,  while  the  law  for  r  focuses  on 
the  temperature  dependence  of  the  mobile  defect  population.  High  above  the  glass 
transition  temperature,  Tg,  many  mobile  defects  exist.  Their  movement  breaks  up 
rigidity  in  the  material.  As  the  temperature  is  lowered,  the  clustering  of  defects  re¬ 
duces  the  mobile  defect  population.  At  Tg  presumably  the  decline  in  mobile  defects 
allows  rigidity  to  percolate  through  the  sample.  This  is  the  glass  transition.  Below 
Tg,  mobile  defects  still  exist  and  relaxation  still  occurs.  The  time-scale  dynamics 
for  r  is  focused  on  To,  the  temperature  when  mobile  defects  disappear,  and  not  on 
Tg.  This  is  why  T  <Tg. 
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4.  FRACTAL  RANDOM  WALKS  IN  TURBULENCE^^ 

RICHARDSON’S  LAW 

In  1926,  Lewis  Fry  Richardson^  discussed  his  discovery  that  the  usual  Brownian 
diffusion  law  for  a  mean  squaire  displacement,  {R^{t))  =  6Dt,  does  not  hold  for  diffu¬ 
sion  in  a  turbulent  fluid.  In  his  studies  of  the  dispersal  of  smoke  from  smokestacks  on 
windy  days,  and  of  the  separation  of  floating  objects  in  turbulent  waters,  Richard¬ 
son  announced  that 


{R?{t))  =  Di^  {for  turbulent  diffusion).  (30) 

In  trying  to  understand  this  result,  Richardson  drew  pictures  somewhat  reminis¬ 
cent  of  fractal  patterns  trying  to  show  how  a  drop  of  dye  would  be  pulled  apart, 
over  many  scales,  in  a  turbulent  flow.  He  felt  that  eddies  of  all  sizes  would  cause 
the  relative  separation  of  two  particles  in  a  turbulent  flow  to  have  a  diffusion  con¬ 
stant  which  depended  on  their  positions.  Taking  the  diffusion  law  {R^{t))  =  Dt 
with  D  =  D{R)  =  Richardson  could  recover  the  correct  scaling  of  Eq.  (27), 
but  with  so  many  scales  entering  a  turbulent  flow  that  Richardson  wondered  how 
to  mathematically  describ'e  turbulent  trajectories.  He  wrote,  “The  failure  of  the 
dispersal  of  a  point-charge  to  serve  as  a  mathematical  element,  from  which  the  dis¬ 
persal  of  an  extended  system  may  be  built  up,  appears  to  be  intimately  connected 
with  the  fact  that  in  the  atmosphere  dispersal  goes  on  in  patches.”  Differential 
equations  are  local,  so  they  cannot  possibly  properly  treat  global  spatial- temporal 
motions  set  up  by  a  hierarchy  of  vortices.  This  led  Richardson  to  even  question  the 
existence  of  differential  equation  for  the  description  of  turbulent  flows.  He  asked, 
“Does  the  Wind  Possess  a  Velocity?  This  question  at  first  sight  foolish  improves 
upon  acquaintance.”  Richardson  then  gave  the  Weierstrass  function  as  an  exam¬ 
ple  of  an  everywhere  non-differentiable  function  and  thought  it  might  somehow  be 
connected  with  turbulent  flows. 

We  will  take  a  circuitous  route  to  deriving  Richardson’s  law.  We  will  begin 
by  discussing  the  Central  Limit  Theorem'for  adding  random  variables  whose  sec¬ 
ond  moments  exist.  We  will  then  study  Levy’s  generalization  for  summing  random 
variables  with  infinite  moments.  By  introducing  trajectories  associated  with  infi¬ 
nite  moment  random  variable  sums,  we  will  be  able  to  describe  a  scale-invariant 
random-walk  process.  Finally,  incorporating  Kolmogorov  space-time  scaling  into 
these  trajectories,  we  will  arrive  at  Richardson’s  law.  Incidentally,  along  the  way 
we  will  use  the  Weierstrass  function,  mentioned  by  Richardson,  as  a  generator  for 
our  scale-invariant  fractal  space-time  random  walks. 
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LEVY  FLIGHTS®  ® 

Add  up  several  identically  distributed  random  variables  Aj ,  each  with  zero  mean 

=  +  +  (31) 

with  the  condition  that  value  of  each  variable  X,  can  be  thought 

of  as  a  step  in  a  random  walk.  Each  jump  length  is  chosen  from  a  distribution  p{x). 
Levy  asked  the  question  of  when  can  the  distribution  of  the  sum  of  n  steps  Pn(ar)  (up 
to  some  scale  factors)  be  the  same  as  the  distribution  of  auiy  term  in  the  sum,  p{x). 
This  is  basically  the  question  of  fractals:  when  does  the  whole  (the  sum)  look  like 
one  of  its  parts?  One  answer  to  this  question  is  well  known.  A  sum  of  Gaussians  is 
a  Gaussian.  Setting  each  =  1  for  /?  =  2,  we  obtain  =  n.  This  means  for  adding 
Gaussians  that  the  variance  of  the  sum  is  the  sum  of  the  variances.  The  distribution 
of  the  Xj’s  is  p{x)  =  (2«’)“*/^exp(— x^),  the  distribution  of  Yn  = 

Pn(x)  =  (27rn)“*/^exp(— x^/n).  So  p(x)  and  Pn(x)  have  the  S2une  distribution  up 
to  the  scale  factor  n.  In  Fourier  space  (x  — ►  /b)p„(ib)  has  a  simple  form, 


Pn(fc)  =  J  ^ 


Pn(x)  exp{ikx)dx  =  exp(— nib^). 


(32) 


Note  that  the  second  moment  of  Pn(®)  is  given  by  —d^Pn(k  =  0)/dk^  =  n.  Levy 
discovered  that  other  solutions  existed  for  Eq.  (27)  such  that  Pn(®)  and  p(x)  had 
the  same  distribution.  He  found  this  to  be  the  case  when 


Pn{k)  =  exp(— const. |ib|^)  (for  0  <  /?  <  2). 


(33) 


The  /?  =  2  case  is  the  Gaussian  which  we  have  just  studied.  For  /?  <  2,  we  note 
that  (x^)  =  —dpn(k  =  0)/dk^  is  infinite.  These  random  walks  with  steps  with 
infinite  second  moments  are  known  as  Levy  flights.  It  now  seems  obvious  that  to 
have  scale-invariant  distributions,  we  would  need  to  sum  up  random  variables  with 
no  scale.  As  we  saw  above,  if  we  have  finite  second  moments,  then  we  will  get  the 
Gaussian  distribution. 

The  exponent  /?  will  turn  out  to  be  the  dimension  of  the  point  set  visited  by 
a  Levy  flight.  For  the  Gaussian  case  where  /?  =  2,  consider  a  rauidom  walk  of 
steps.  The  probability  distribution  Pnix)  is  a  Gaussian  of  standard  deviation  of 
N.  Each  of  the  jumps  has  a  Gaussian  distribution  with  a  stand2u^d  deviation, 
=  unity,  so  one  can  consider  the  distribution  after  steps  to  be  comprised 
of  Gaussians,  each  scaled  down  in  standard  deviation  from  N  to  unity.  Thus 
a  fractal  dimension  of  In N^/ ]nN  =  2  can  be  ascribed  to  standard  random  walks 
whose  steps  have  finite  mean  square  displacements.  This  is  adso  in  accord  with  the 
knowledge  that  a  random  walker  visits  every  point  in  two  dimensions,  but  not  so 
in  higher  dimensions. 

To  get  a  deeper  understanding  of  Eq.  (33),  let  us  write  p„(x)  as 


p„(x)  =  J p„_i(x  -  x')p(x')dx' 


(34) 
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which  transforms  in  Fourier  space  to  (due  to  its  convolution  form) 

Pn(^)  =  [p(^)r-  (35) 

For  small  k  (large  x), 

p(k)  =  j p(x)exp{-ikx)dx  =  I  -  +  0(t^)  (36) 

when  (x^)  is  finite.  For  this  case  p{k)  ~  exp(  — (x“)F^/2),  Pnik)  ~  exp(— n(x“)t^/2), 
so  Pn(a:)  asymptotically  behaves  like  a  Gaussian  with  variance  n. 

How  can  we  expand  p{k)  when  (x^)  is  infinite?  Our  analysis  now  parallels  the 
discussion  of  fractal  time,  except  here  we  address  the  properties  of  fractal  space.  For 
a  specific  example  let  us  construct  a  random  walk  without  a  characteristic  jumps 
size.  Let 

^r  I 

p(x)  =  +  ^».»->!-  (37) 

j=0 

Jumps  of  size  ±1,  ±6,  ±6^,  etc.  can  occur,  but  jumps  an  order  of  magnitude  longer 
in  base  b  occur  an  order  of  jnagnitude  less  often  in  base  N.  We  make  about  N  jumps 
of  length  unity  before,  on  the  average,  a  jump  of  length  b  occurs,  and  so  on,  until 
in  a  hierarchical  fashion  patchy  clusters  of  all  sizes  are  formed.  We  expect  a  fractal 
dimension  of  In  A^/ In  6  to  appear.  We  need  to  analyze  p{k)  and,  taking  the  Fourier 
transform  of  p{x),  we  arrive  at 

W  .  oo 

p(k)  =  (38) 

;  =  0 

which  is  precisely  the  Weierstrass  function  called  for  by  Richardson.  We  could  again 
go  through  the  Mellin  transform  analysis  introduced  in  our  discussion  of  fractal 
time,  but  here  we  will  just  note  that  p{k)  satisfies  the  scaling  relation 

pik)  =  N-^p{bk)+  cos{k)  (35) 

which  has  a  solution  which  includes  a  term  of  the  form 

p(*:)  ~  1  -  |A:|^  ~  exp(-|Ar|^)  ^with  (3  =  •  (40) 

This  exponential  form  with  the  fractional  power  comprises  the  non-Gaussian  solu¬ 
tions  to  Levy’s  question  addressed  in  Eq.  (33).  If  one  wants  to  add  random  variables 
(take  a  random  walk)  and  have  the  probability  distribution  after  n  steps  look  like 
the  probability  distribution  after  one  step  (except  for  a  change  of  scale),  then  your 
random  variables  are  either  Gaussian  or  have  infinite  second  moments.  This  means 
the  solution  is  either  Gaussian  or  fractad. 
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LEVY  WALKS'^ 

How  does  one  use  Levy  flights  in  physics,  since  mean  square  distances  are  infinite. 
One  approach  to  make  the  problem  more  physical  is  to  take  account  of  how  long  it 
would  take  to  traverse  a  distance  r.  Let  ^(r,  t)  be  the  probability  density  to  make 
a  jump  of  displacement  r,  in  a  time  t.  We  write 

^(r,t)  =  ip{t\r)p{r) ,  (41) 

where  p(r),  as  before,  is  the  probability  that  a  jump  of  length  r  occurs. 
is  the  conditional  probability  that,  given  a  jump  of  length  r  occurred,  it  took  a 
time  i  to  complete.  For  Levy  flights,  we  choose  p(r)  ~  with  /?  <  2  so  (r^) 

diverges.  If  we  choose  ^{t\r)  =  ^(<)  so  jump  distances  and  jump  times  are  chosen 
independently,  then  a  divergent  (r^)  will  still  result.  A  coupled  space-time  memory 
can,  however,  get  around  this  problem.  Let  us  chose 

V>(t|r)  =  6(^t-  (42) 

where  t;(r)  is  the  velocity  of  a  jump  of  length  r.  The  form  V’(^k)  =  exp(— [<  — 
r/t;(r)]^)  would  do  just  as*welL  Fortunately,  Kolmogorov  has  taught  us  for  isotropic 
homogeneous  turbulent  flows  how  to  calculate  v(r).  He  assumed  that  the  average 
dissipation  over  a  scale  r  would  be  independent  of  r.  Now  e  is  the  energy/time 
~  v{r)^/t  —  v{r)^ /r.  For  the  dissipation  to  be  constant,  we  need 

v{r)  =  r^^^(Kolmogorov  scaling).  (43) 

The  energy  E  is  proportional  to  ~  and  its  Fourier  transform  Ek  ~ 

This  —5/3  law  is  the  best  known  version  of  Kolmogorov  scaling.  With  this  informa¬ 
tion  we  can  now  proceed  to  calculate  the  mean  square  displacement  for  turbulent 
diffusion. 

TURBULENT  DIFFUSION:  A  SPA'CE-TIME  FRACTAL^'' 

We  need  to  generalize  our  random  walk  to  include  the  coupled  memory  ^(r,  t). 
Note  fractal  time  involved  being  stuck  at  one  place  for  a  hierarchical  distribution 
of  time.  For  Levy  walks  the  walker  gets  stuck  in  the  same  momentum  state  for  a 
hierarchical  distribution  of  times.  The  probability  density  Q{r,t)  for  reau:hing  a  site 
r  exactly  at  time  t  is  given  by 

Qi^>  0  =  5^  Qi^  -r\t-  r)^(r',  r)dr  -|-  6r,o6{t)  (44) 

where  we  account  for  reaching  r  —  r'  at  time  t  —  t,  and  then  taking  a  jump  of 
displacement  r'  which  takes  a  time  r  to  complete.  This  is  the  coupled  memory 
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continuous-time  version  of  Eq.  (13)  for  Pnif).  In  Fourier- Laplace  space  we  find,  for 
the  Green's  function  propagator, 

Q(L',6)  =  [1-'1/(A:,s)]-L  (45) 


The  probability  P{r,i)  to  be  at  site  r  at  time  t  is  a  little  bit  more  complicated 
because  one  can  take  a  jump  which  passes  r  at  time  t.  but  the  jump  continues  and 
ends  at  a  different  site  at  a  later  time.  For  simplicity,  one  can  get  the  right  scaling 
by  just  focusing  on  walks  which  reach  site  r  at  time  t  —  t.  and  the  next  jump  is  yet 
completed  by  time  t.  Then 


P(r, 


where  xi’{t)  =  ^  'I'(r,  !)•  In  fburier-Laplace  space 


P{k,s) 


1  -  ip(s) 
s[l  -  'I'(A;,s)]  ■ 


(46) 


(47) 


For  a  walk  with  (r)  =  0.  p{r)  ~  and  il’(t\r)  =  6{t  -  r/v{r))  with  t;(r)  = 

we  find,  for  {r~{t))  =  ~C~J d~ P{k  =  0,s)/dk'^  where  C~^  is  the  inverse  Laplace 
transform,  that 


('■"(0)  = 


r 

,  ^2+(3/2)(l-/J) 


for  f3  <  1/3; 
for  1/3  <3<  5/3; 
for  (3  >  5/3. 


(48) 


For  the  case  /S  <  1/3,  we  recover  Richardson’s  law  of  turbulent  diffusion.  For 
large  enough  0  the  mean  square  time  spent  in  a  flight  segment  becomes  finite  and 
Brownian  motion  is  achieved  in  accord  with  the  Central  Limit  Theorem.  If  the 
memory  was  decoupled,  then  the  calculation  of  {r~)  would  involve  d^p{k  =  0)/dk^ 
which  is  infinite.  For  the  coupled  space-time  memory,  one  instead  calculates  with 
f  exp{ikr)xlj(s\r)p(r)dr  instead  of  p(k).  This  will  turn  the  infinity  of  the  Levy  flight 
into  the  temporal  scaling  of  the  Levy  walk. 

One  advantage  of  this  approEK;h  Fs  that  one  can  visualize  the  types  of  random- 
walk  trajectories  which  can  lead  to  turbulent  motions.  F.  Hayot'*  has  actually  im¬ 
plemented  the  Levy  Walk  model  to  simulate  turbulent  pipe  flow.  Instead  of  the 
parabolic  velocity  profile  of  laminar  flow,  a  better  mixed  flow  with  a  flatter  velocity 
profile  is  discovered.  Comparing  his  calculated  velocity  profile  with  experimental  ve¬ 
locity  profiles,  he  is  able  to  associate  a  Reynolds  number  of  about  10^  with  the  flow. 
Basically,  the  Levy  Walk  zeroth-order  state  is  already  turbulent  for  small  enough 
0,  while  the  traditional  lattice  gas  hydrodynamics  is  based  on  nearest-neighbor 
collisions  which  corresponds  to  low  Reynolds  number  flow.  Enormous  computing 
power,  or  tricks,  would  be  needed  to  reach  a  turbulent  state  in  the  standard  lattice 
gas  hydrodynamic  approach.  Turbulence  is  natural  for  the  Levy  Walk  approach. 
Phase  diffusion  in  Josephson  junctions^’*®  and  transport  in  stochastic  webs'"  are 
other  examples  where  Levy  walks  occur. 
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5.  BERNOULLI  SCALING^^ 

It  is  of  interest  that  the  history  of  probability  theory  has  already  provided  us  with 
a  beautiful  example  of  the  types  of  scaling  discussed  in  this  paper.  The  problem 
involves  a  certain  game  of  chance.  The  game  is  to  flip  a  coin  until  a  head  appears. 
This  could  take  only  one  flip  with  probability  of  1/2,  or  n  flips  (i.e.,  one  gets  n  —  1 
tails  in  a  row  before  a  head  appears)  with  probability  1/2”.  Suppose  you  win  2" 
coins  if  n  —  1  tails  appear  before  the  head  appears.  Then  your  expected  winnings  are 
lx  1/2 +  2x  1/4  +  .. .  +  2”/2”'''^  +  . . .  =  oo.  This  game  was  introduced  by  Nicolaas 
Bernoulli  (the  nephew  of  Jacob  and  John)  in  the  early  1700s.  It  is  called  the  St. 
Petersburg  Paradox  because  Daniel  Bernoulli  wrote  about  it  in  the  Commentary 
of  the  St.  Petersburg  Academy.  The  question  which  was  posed  was  how  much  ante 
should  be  required  to  place  the  game.  The  player  favors  a  small  ante  because  he 
will  win  only  1  coin  with  probability  1/2,  2  or  less  coins  with  probability  3/4,  4 
or  less  coins  with  probability  7/8,  etc.  The  banker,  who  must  take  on  all  comers, 
favors  an  inflnite  ante  because  this  is  his  expected  loss.  The  two  parties  cannot  come 
to  an  agreement  because  they  are  trying  to  determine  a  characteristic  scale  from  a 
distribution  which  does  not  possess  one.  All  scales  enter  and  the  probabilities  for  all 
possible  winnings  add  up  to  unity.  However,  an  order  of  magnitude  greater  winnings 
occurs  with  an  order  of  magnitude  less  probability.  This  example  is  the  forerunner 
of  fractal  time  where  the  waiting  times  between  jumps  occur  on  all  scale,  but  with 
order  of  magnitude  longer  waits  occurring  an  order  of  magnitude  less  often.  The 
fact  that  {<)  was  inflnite  for  a  fractal  time  process  did  not  mean  that  the  duration 
between  every  event  was  inflnite,  just  as  in  this  coin  game  not  every  player  wins 
an  inflnite  amount  of  money  just  because  the  expected  winning  is  infinite.  The 
perception  of  this  paradox  in  the  1700s  was  to  cast  aspersions  on  the  ability  of 
probability  to  have  a  sound  mathematical  foundation.  In  the  20th  century  we  see 
this  paradox  as  a  rich  example  of  scaling  with  all  its  inherent  exponents,  fractal 
dimensions,  renormalizations,  and  natural  description  of  complex  systems. 
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Dy  nam  ics  of  Web  Maps :  Parameter 
^pendence  of  Stochastic  Layers 


INTRODUCTION 

Zaslavsky’s  web  arises  from  the  following  idealized  thought  experiment. 

A  charged  particle  moves  in  the  xy-plane  perpendicular  to  a  uniform  magnetic 
field.  If  undisturbed,  it  moves  in  a  circle  with  uniform  angular  frequency  Q,  and 
this  imposes  a  linear  relation  between  coordinate  and  velocity  components: 

x=n(y-yc),  (1) 

y  =  -  n(x  -Xc),  (2) 

where  {xc,yc)  is  the  center  of  the  circle.  Now  suppose  the  particle  is  subjected  to 
instantaneous  kicks,  q  times  per  revolution,  by  an  electric  field 


E  =  ej,£'osin*y^^  ■ 


Since  each  kick  leaves  y  and  x  unchanged,  Eq.  (1)  is  valid  for  all  times,  with  yc  a 
constant  of  the  motion,  which  we  choose  to  be  zero  without  loss  of  generality.  The 
appearance  of  a  typical  orbit  is  suggested  iiv  Figure  1.  We  note  that,  with  the  aid  of 
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Eq.  (1),  the  relationship  between  the  velocity  vectors  just  prior  to  successive  kicks 
can  be  expressed  recursively,  as  an  area-preserving  mapping  M  (the  web  map)  of 
the  velocity  plane, 


M  : 


sin  ^ 
cos^ 

9 


t;  -b  a  sin  u 


(4) 


where  u  =  kx/Q,  v  =  ky/Q.  and  a  is  a  dimensionless  parameter  proportional  to 
the  kick  amplitude. 

Although  the  assumptions  which  went  into  the  above  model  are  unrealistic  for 
practical  applications  in  plasma  physics,  it  is  of  some  theoretical  interest  whether 
such  a  simple  mechanism  can  lead  to  unbounded  acceleration  of  particles.  That 
is,  are  there  initial  conditions  under  which  repeated  application  of  the  map  M 
leads  to  ever  increasing  velocities?  Or  is  there  a  dynamical  obstacle  that  prevents 
orbits  from  marching  out  to  infinity  in  the  uv-plane?  Such  an  obstacle  would  be  a 
closed  invariant  curve  surrounding  the  origin.  Since  the  interior  of  such  a  curve  is 
mapped  into  itself  by  the  area-preserving  map  M,  no  orbit  initially  inside  can  cross 
to  the  outside.  Because  of  their  well-known  role  in  the  theorems^  **  of  Kolmogorov, 
Arnold,  and  Moser,  we  shall  frequently  refer  to  simple,  closed,  invariant  curves  as 
KAM  curves. 

The  answer  to  our  question  is  relatively  simple  in  the  cases  ^  =  3,  4,  or  6,  where 
the  KAM  curves  are  restricted  to  the  interiors  of  cells  which  tile  the  plane  peri¬ 
odically,  and  there  is  an  infinitely  extended  web  of  unbounded  chaotic  orbits.  The 


FIGURE  1  Motion  of  a  periodically  kicked  charged  particle  in  a  uniform  magnetic  field. 
Here  kicks  in  the  y  direction  occur  four  times  per  revolution.  Note  the  constancy  of  the 
y-component  of  the  center  of  circular  motion. 


Dynamics  of  Web  Maps 


307 


cases  q  =  b,7,  8,  etc.  are  far  more  subtle.  Once  again,  for  sufficiently  large  a,  there  is 
a  web  of  chaotic  orbits,  this  time  with  an  apparent  quasi-crystalline  synametry.  .Now 
periodicity  no  longer  limits  the  size  of  KAM  curves,  and  in  fact  numerical  studies^  ‘ 
show  that  in  the  a  —  0  limit  there  exist  closed  invariant  curves  of  arbitrarily  large 
radius.  Based  on  perturbative  calculations^  and  numerical  experiments.^  ' 
one  expects  an  inexorable  increase  in  the  area  occupied  by  chaotic  orbits  ris  the 
parameter  a  increases,  so  that  all  the  KAM  curves  are  eventually  swallowed  up  by 
chaos.  Beyond  this  broad  picture,  very  little  is  known  about  the  parameter  evolution 
of  KAM  curves  and  quasi-crystalline  stochastic  webs.  In  this  lecture  we  report  on  a 
small  step  toward  gaining  this  understanding.  VVe  shall  concentrate  on  a  particular 
piece  of  the  five-fold  web,  and  try  to  follow  graphically  the  evolution  of  one  of  its 
boundaries.  We  shall  find  that  the  behavior  is  a  good  deal  more  complicated  than 
one  might  have  guessed  from  studies  of  simpler  maps.* 


BASIC  CONCEPTS 

Before  discussing  our  numerical  explorations,  it  is  important  to  make  more  precise 
some  of  the  basic  concepts.  We  restrict  ourselves  to  the  web  map  of  Eq.  (4)  with 
q  =  5. 

SYMMETRIES 

The  geometry  of  typical  orbits  in  the  uu-plane  (the  so-called  phase  portrait)  is  char¬ 
acterized  by  some  important  symmetries.  First  of  all.  there  is  an  invariance  under 
the  map  M  itself,  which,  for  all  points  except  those  close  to  the  origin,  is  approx¬ 
imately  a  clockwise  rotation  by  2ir/b.  There  are  additionaJ  exact  symmetries^  ^  of 
M-invariant  objects,  namely  mirror  reflections  about  the  axes  inclined  at  polar  an¬ 
gles  Str/lO  and  4;r/5  (the  product  of  the  two  reflections  is  just  a  total  inversion). 
Exploiting  these  mirror  symmetries  is  crucial  to  the  effectiveness  of  numerical  meth¬ 
ods  applied  to  long  orbits.® 


FIXED  POINTS 

There  are  infinitely  many  fixed  points  of  the  fifth-iterate  map  A/®  (this  is  the  map, 
rather  than  M  itself,  which  marches  in  small  steps  and  traces  out  the  “shape"  of  an 
invariant  curve  or  stochastic  layer),  and  they  form  an  approximate  quasi-crystalline 
array  in  the  wr-plane.  Since  the  map  preserves  areas,  there  are  only  two  types  of 
fixed  points:  stable  (or  elliptic)  and  unstable  (or  hyperbolic).  The  former  are  the 
centers  of  small-scale  circulation  and  are  not  of  particular  interest  to  the  present 
investigation.  The  hyperbolic  fixed  points,  on  the  other  hand,  determine  what  we 
mean  by  the  “shape”  of  long  invariant  curves.  Intersecting  at  each  such  fixed  point 
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are  two  special  M®-invariant  curves,  its  stable  and  unstable  manifolds.  Alon^  the 
stable  manifold,  an  orbit  moves  toward  the  fixed  point,  with  the  steps  decreasing 
in  size  geometrically;  along  the  unstable  manifold,  orbits  move  away  from  the  fixed 
point,  with  steps  increasing  in  size  geometrically.  Typical  orbits  in  the  vicinity 
of  an  unstable  fixed  point  are  “scattered”  by  it;  they  approach  it  along  a  stable 
direction  and  leave  along  an  unstable  one.  A  long  orbit  will  visit  the  neighborhoods 
of  many  hyperbolic  fixed  points.  Its  shape  will  resemble  a  complicated  polygon, 
with  a  rounded  corner  wherever  it  passes  near  a  hyperbolic  fixed  point  (see  Figure 
3  for  some  simple  examples). 


STOCHASTIC  LAYERS 

In  the  special  case  of  an  integrable  model,  the  unstable  manifold  of  a  hyperbolic 
fixed  point  ends  at  another  such  point,  forming  part  of  a  separatrix  curve.  On  the 
other  hand,  a  generic  perturbation  of  an  integrable  model  leads  to  breakdown  of 
separatrices  and  their  replacement  by  stochastic  layers  of  nonzero  thickness  within 
which  the  hyperbolic  fixed  points  are  located.  Almost  all  of  the  orbits  within  the 
layer  are  chaotic,  and  if  one  assumes  ergodicity  within  the  layer  (islands  of  stability 
may  be  surrounded  by  the  layer  but  are  not  considered  part  of  it),  one  can  define 
the  layer  to  be  the  closure*bf  any  one  of  its  chaotic  orbits. 

The  web  map  is  believed  to  be  integrable  only  in  the  limit  of  vanishing  a  (the 
first-order  approximation  to  is  a  Hamiltonian  flow).  Hence  we  expect  to  find 
stochastic  layers  associated  with  all  the  hyperbolic  fixed  points  of  the  fifth-iterate 
map. 


STOCHASTIC  WEBS  AND  KAM  CURVES 

Being  embedded  in  a  common  stochastic  layer  is  clearly  a  connectivity  equivalence 
relation  for  hyperbolic  fixed  points.  The  complete  system  of  those  connectivity 
components  which  surround  the  origin  resembles,  in  structure,  a  spider  web  (at 
least  if  a  is  neither  too  large  nor  too  small),  and  hence  the  name  stochastic  web. 
Our  original  questions  concerning  theT)ossibIe  trapping  of  orbits  can  be  rephrased  as 
follows:  given  a,  does  the  stochastic  web  have  a  finite  component?  The  dternative 
is  that  there  is  a  connected  web  extending  throughout  the  plane.  If  there  is  a  finite 
component,  then  its  outer  boundary  is  a  closed  invariant  curve  C  which  traps  all 
orbits  with  initial  point  interior  to  C.  Typically,  there  will  be  a  narrow  annular 
region,  between  C  and  the  web  component  surrounding^  C,  within  which  quasi¬ 
periodicity  reigns  (in  the  form  of  closed  invariant  curves  interlaced  with  islcad 
chains). 

The  relationship  between  hyperbolic  fixed  points,  stochastic  layers  and  webs, 
and  KAM-dominated  annuli  is  sketched  in  Figure  2.  From  our  previous  remarks 
about  symmetries,  it  is  obviously  sufficient  to  restrict  our  attention  to  a  sector  of 
angle  t/5. 
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FIGURE  2  Sketch  sh'.>>«ing_ schematically  two  stochastic  layers  (shaded)  which  form 
connected  components  of  a  stochastic  web.  KAM  curves,  interspersed  with  island 
chains,  are  to  be  found  in  the  annular  region  between  the  layers. 


Ultimately  we  would  like  to  know  the  connectivity  structure  of  the  stochastic 
web  as  a  function  of  a.  In  particular,  what  is  the  smallest  value  qq  such  that  for  all 
a  >  Co,  the  web  has  only  a  single  component?  As  we  shall  see  below,  we  are  only 
beginning  to  make  some  headway  toward  answering  such  questions. 


DETERMINATION  OF  STOCHASTIC  LAYER  BOUNDARIES 

To  investigate  how  stochastic  layers  evolve,  it  is  cruci"'!  to  have  a  reliable  opera¬ 
tional  definition  of  the  boundary  of  a  stochastic  layer.  From  its  definition  and  the 
assumption  of  restricted  ergodicity,  one  might  try  choosing  a  “random”  assortment 
of  initial  conditions  well  within  the  layer  and  simply  iterating  away,  thus  defining 
the  boundary  as  the  limiting  envelope  of  the  chaotic  orbits  explored  in  this  fashion. 
In  many  instances  this  can  be  misleading,  due  to  the  presence  near  the  boundary  of 
cantori:  invariant  fractal  sets  which  can  block  the  convergence  of  the  envelope  to 
the  true  boundary  for  unacceptably  long  times.  More  efficient  is  an  approach  from 
the  quasi-periodic  side,  thinking  of  the  boundary  as  the  “last  KAM  curve”  beyond 
which  quasi-periodic  orbits  on  closed  curves  are  impossible. 

A  straightforward  method  of  doing  this  has  been  given  by  Greene. The  main 
idea  is  to  concentrate  on  a  dense  set  of  “noble”  KAM  curves.  For  a  quasi-periodic 
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orbit  on  such  a  curve,  the  average  numbers  of  revolutions  about  the  origin  per  map 
iteration  (the  rotation  number)  has  a  continued-fraction  representation  of  the  form 

[mo.  mi, 7712, - 1, 1, 1. .  . .]  =  mo -I - ^ — j —  (5) 

‘  +  T+- 

In  the  neighborhood  of  a  noble  KAM  curve,  one  expects  to  find  a  sequence  of  stable 
periodic  orbits  with  rational  rotation  numbers  in  which  the  continued  fraction  rep¬ 
resentation  (5)  is  truncated  at  n  levels.  According  to  Greene,  there  is  a  quantity  R, 
the  residue,  whose  scaling  bt ,  ior  as  one  proceeds  down  the  sequence  of  approxi¬ 
mating  periodic  orbits,  gives  a  criterion  for  the  existence  of  a  KAM  curve  with  the 
given  irrational  rotation  number.  If  the  limiting  curve  exists,  R  decreases  to  zero  at 
a  geometric  rate;  if  the  curve  does  not  exist,  the  residue  blows  up  geometrically.  For 
the  borderline  case,  applicable  to  the  “last  KAM  curve,”  R  approaches  a  constant 
close  to  0.25. 

Applying  Greene’s  criterion  requires  an  ability  to  locate  long  periodic  orbits 
rapidly  and  precisely.  Fortunately  the  existence  of  a  symmetry  axis  greatly  enhances 
our  ability  to  make  such  calculations  for  the  9  =  5  web  map. 


EXPLORATION  OF  A  SPECIFIC  STOCHASTIC  LAYER 

We  now  turn  to  a  computational  exploration  of  a  “typical”  stochastic  layer  of  the 
q  =  5  web  map.  What  I  shall  describe  here  is  a  graphical  depiction  of  the  parametric 
evolution  of  the  selected  layer.  This  is  one  of  several  approaches  (including  a  direct 
application  of  Greene’s  criterion)  which  I  have  used  to  investigate  this  particular 
layer.  The  interested  reader  will  find  a  deteiiled  account  in  Lowenstein.® 

The  map  A/®  has  a  hyperbolic  fixed  point  (uoi^'o)  on  the  symmetry  line  near 
the  point  (-6.5. 17)  (its  precise  location  depends  on  a,  and  is  easily  found  by  a  one¬ 
dimensional  search  along  the  symmetry  line).  For  values  of  a  between  about  0.21  and 
0.36,  the  fixed  point  is  immersed  in  a  web  component,  one- tenth  of  which  is  shown  in 
Figure  3(b).  For  at  least  part  of  the  parameter  range,  this  component  is  surrounded 
by  the  larger  one  shown  in  Figure  3(c),  and  the  two  eire  separated  by  a  thin  annular 
region.  Figure  3(a)  shows  an  orbit  within  the  annulus.  The  relationships  among 
the  two  stochastic  layers,  their  associated  fixed  points,  the  2Lnnulus,  and  the  origin 
are  indicated  schematically  in  Figure  2.  Each  of  the  pictures  of  Figure  3(a)-(c)  is 
a  plot  of  a  single  orbit  with  an  arbitrarily  selected  initial  point  on  the  symmetry 
line.  Although  each  of  the  orbits  surrounds  the  origin,  all  points  are  mapped  by 
symmetry  transformations  into  the  same  sector  of  opening  angle  7r/5. 

To  study  the  detailed  evolution  of  the  stochastic  layer,  we  focus  on  a  small 
neighborhood  of  the  hyperbolic  fixed  point.  This  has  the  advantage  of  spreading 
out  the  orbits  along  the  symmetry  line,  and  allows  a  natural  definition  of  the  layer’s 
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FIGURE  3  Plots  of  long  orbits  starting  near  the  hyperbolic  fixed  point  (uo,vo).  Using 
symmetries,  points  are  plotted  in  a  single  sector  with  opening  angle  36°.  The  three 
orbits  are  (a)  in  the  annulus  between  stochastic  layers  (dose  to  a  KAM  curve),  (b)  in 
the  inner  stochastic  layer  (chaotic  orbit),  and  (c)  in  the  outer  stochastic  layer  (chaotic 
orbit). 


width,  namely  the  distance  of  the  boundary  from  the  fixed  point,  measured  along 
the  symmetry  line. 

Figure  4  shows  a  sequence  of  twelve  snapshots  of  representative  orbits  near  the 
stochastic  layer  boundary,  for  equally  spaced  peirameter  values  between  0.3030  and 
0.3041.  The  plots  are  generated  as  follows: 

i.  For  each  a,  the  hyperbolic  fixed  point  (uo^o)  is  found  by  a  search  along  the 
symmetry  line.  The  field  of  view  is  set  to  be  uo  —  0.06  <  u  <  uq  -h  0.006, 
Vo  —  0.0054  <  V  <  Vo  +  0.054. 

ii.  Using  the  symmetry  operations,  we  endow  the  angle  7r/5  sector  straddling  the 
symmetry  line  with  periodic  boundary  conditions. 

iii.  Preliminary  graphical  explorations  indicate  that  the  primary  periodic  orbits 
(i.e.,  the  minimal  ones,  corresponding  to  the  largest  island  chains)  near  the 
layer  boundary  have  periods  between  494  and  510  in  the  selected  parameter 
range.  Note  that  these  are  periods  for  traversing  the  reentrant  sector;  the  cor¬ 
responding  orbits  circling  the  origin  require  approximately  ten  times  as  many 
iterations  of  Af®. 

iv.  For  primary  period  n,  we  locate  periodic  orbits  with  rotation  numbers 

(n,4,l,l,l,l]  ="+^. 

[n,2, 1,1, 1,1,1]  =n-b^, 

[n,  1,1, 1,1, 1,1,1]  =n-(-^, 

18 

[n,l,3,l,l,l,l]  =n-l-— , 
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FIGURE  4  Snapshots  of  orbits  near  the  inner  stochastic  layer  boundary,  with  the 
parameter  a  increasing  from  0.3030  to  0.3041  in  steps  of  0.0001.  (continued) 
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FIGURE  4  (continued) 
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using  a  search  on  the  symmetry  axis.  Outside  the  stochastic  layer,  according 
to  Greene’s  analysis,  these  orbits  belong  to  island  chains  approximating  KAM 
curves.  Within  the  layer  they  are  almost  certainly  unstable. 

V.  Starting  near  each  of  the  periodic  points  (10“^  off  the  symmetry  auxis),  we 
iterate  15,000  times,  plotting  all  those  points  which  fall  within  the  field  of 
view.  The  near-KAM  orbits  will  appear  as  dashed  curves  (actually  lines  of  ver> 
narrow  islands),  while  those  well  within  the  layer  will  soon  show  their  chaotic 
character.  Just  inside  the  layer  boundary,  the  orbits  will  spend  a  great  deal  of 
time  near  cantori,  and  the  plot  will  show  a  fuzzy  dashed  curve. 

A  glance  at  a  plot  generated  in  this  manner  gives  one  a  fairly  good  idea  of  the 
location  of  the  stochastic  layer  boundary.  This  is  confirmed  by  comparison  with  the 
more  precise  results  obtained  by  systematic  application  of  Greene’s  criterion. 

What  does  our  sequence  of  snapshots  reveal  about  the  parametric  evolution  of 
the  stochastic  layer  in  question?  The  first  three  frames  show  no  dramatic  changes 
in  the  stochastic  layer,  only  an  extremely  gentle  expansion.  In  Figure  4(d)-(f),  a 
remarkable  increase  of  stability  occurs  deep  within  the  layer.  The  orbits  are  chaotic, 
but  they  spread  out  very  little  during  the  15,000  iterations  (this  is  particularly  evi¬ 
dent  in  Figure  4(f)).  This  is  a  prelude  to  the  birth  of  a  narrow  channel  of  regularity 
which  is  barely  visible  tn-Figure  4(g)  but  grows  rapidly  and  moves  upward  while 
the  chaotic  layer  above  it  gradually  dissolves.  By  the  last  frame  we  are  more  or  less 
back  to  where  we  were  at  the  beginning  of  the  sequence. 

The  more  comprehensive  exploration  of  Lowenstein®  shows  that  the  cycle  de¬ 
picted  in  Figure  4  is  repeated  many  times,  with  a  generally  increasing  amplitude, 
as  a  increases  from  0.30  to  0.36. 

Perhaps  the  most  interesting  feature  of  Figure  4  is  the  discontinuous  decrease, 
between  frames  (f)  and  (g),  in  the  width  of  the  stochastic  layer.  Finer  subdivision 
of  the  interval®  reveals  that  the  collapse  is  at  leeist  by  a  factor  of  4.  Additional 
work  will  be  needed  to  probe  the  details  of  the  bifurcation  which  opens  up  an  inner 
channel  between  a  =  0.3035  and  a  =  0.3036,  as  well  as  to  find  the  dynamical  origin 
of  the  phenomenon. 
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Non-local  cellular  automata  are  fully  discretized  and  uniform  high-dimen¬ 
sional  dynamical  systems  with  non-local  interactions.  It  is  emphasized  that 
although  non-local  interaction  is  not  considered  as  a  correct  description  of 
the  physical  world  at  its  lowest  level,  at  higher  levels,  it  nevertheless  is 
an  important  feature  for  systems  in,  for  example,  biology  and  economics. 
Many  properties  of  non-local  cellular  automata  are  investigated  in  another 
publication.^  In  this  lecture  note,  1  will  only  highlight  a  few  topics,  including 
the  analytic  approximation  of  macroscopic  dynamics,  systems  of  coupled 
selectors,  and  group  meeting  problems. 


FROM  LOCAL  TO  NON-LOCAL  DYNAMICAL  SYSTEMS 

One  of  the  most  important  cispects  of  a  complex  system  is  its  time  evolution  fol¬ 
lowing  a  rule  which  does  not  change  in  time.  A  point  of  view,  though  perhaps 
extreme,  is  that  since  all  physical  laws  aie  fixed  (for  example,  there  is  no  evidence 
that  the  gravitational  force  falls  off  as  1/r^  today — where  r  is  the  distance  between 
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two  mass  objects — but  falls  off  as  1/r^  tomorrow),  whatever  has  happened  on  the 
earth  is  a  realization  of  a  gigcintic  dynamical  system  with  those  fixed  physical  laws. 
With  this  point  of  view,  the  evolution  of  life  as  well  as  natural  selection  can  also  be 
modeled  by  complex  dynamical  systems  with  fixed  rules,  although  this  modeling 
will  be  extremely  difficult  because  the  evolution  of  life  is  much  more  complex  than 
practically  all  model  dynamical  systems  that  we  have  known. 

Since  physical  laws  are  local  (there  is  no  experimental  evidence  that  physical 
interaction  can  be  accomplished  nonlocally),  one  may  argue  that  we  only  need 
locally  coupled  high-dimensional  dynamical  systems  to  model  everything.  In  other 
words,  there  is  no  need  to  introduce  nonlocality  in  the  model  dynamical  system. 

However,  in  a  more  realistic  modeling  of  the  world  around  us,  we  do  not  come 
down  to  the  very  end  of  the  microscopic  description.  The  entities  that  interact  with 
each  other  are  not  quarks,  nucleus,  atoms,  or  molecules,  but  things  like  neurons  in 
a  brain,  animals  in  an  ecological  system,  or  agents  in  a  stock  market.  As  the  level 
of  description  increases,  two  notions  have  been  changed.  First,  the  dynamics  rule 
may  not  be  fixed  in  time  (they  are  not  the  golden,  universal,  time-invariant  physical 
laws  any  more).  Second,  the  interaction  between  entities  may  not  be  local. 

If  the  dynamical  rule  is  not  time-invariant,  it  can  be  very  hard  to  describe 
and  to  study  the  resulting  dynamical  system,  unless  the  dynamics  of  the  rule  is 
describable.  In  other  words,  we  need  two  sets  of  dynamical  systems:  at  the  higher 
level,  there  is  a  dynamics  of  the  rules,  and  at  the  lower  level,  there  is  a  dynamics  of 
the  entities.  Many  evolutionary  models  are  of  such  nature.  The  complexity  of  the 
system  results  from  the  interplay  between  higher-level  and  lower-level  dynamics. 
One  can  even  imagine  three  or  more  levels  of  dynamics,  in  which  the  entity  of  the 
higher  level  is  the  rule  of  the  level  one  step  lower. 

These  multi-level  dynamical  systems  are  fascinating  systems  to  study.  But  they 
are  outside  the  realm  of  this  lecture  note.  For  the  time  being,  to  start  from  the  sim¬ 
plest  scenario,  I  will  assume  that  the  lower-level  rules  are  not  chauiged.  An  expla¬ 
nation  for  this  assumption  is  that  the  higher-level  rules  function  on  a  much  longer 
time  scale,  so  that  during  this  time  scale,  the  lower-level  rules  can  be  considered  as 
unchanged. 

The  issue  I  want  to  awddress  here  is  the  following:  what  happens  when  the  non¬ 
locality  is  introduced  to  a  dynamic'^  system?  One  should  know  that  the  locality  of 
interaction  is  a  terrible  assumption  for  many  systems  with  a  high-level  description. 
For  example,  the  transmission  of  signal  from  one  neuron  to  another  in  the  brain 
is  through  axons  and  dendrites.  The  distance  between  two  neurons  is  an  irrelevant 
piece  of  information  concerning  whether  or  not  the  two  are  connected  to  each  other. 

Note  that  the  nonlocality  at  this  level  of  the  description  (interaction  between 
neurons)  does  not  contradict  the  locality  at  the  microscopic  level:  the  traveling 
chemical  signals  do  obey  local  physical  and  chemical  laws.  This  fact,  however,  does 
not  prevent  us  from  explicitly  incorporating  the  nonlocality  into  the  modeling  pro¬ 
cess  when  describing  the  interaction  between  neurons. 

Similarly,  two  agents  or  brokers  in  a  stock  market  can  communicate  via  tele¬ 
phone  line  regardless  of  how  far  or  close  the  two  are  to  eaw:h  other.  Again,  there  is  no 
contreidiction  with  the  locality  of  the  physical  laws.  Admittedly,  the  electricad  signad 
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does  take  a  longer  time  to  travel  for  a  longer  telephone  line  than  a  short  one,  but  the 
difference  is  so  small  compared  with  the  time  scale  of  the  stock  market  activities, 
that  this  fact  is  irrelevant.  The  more  important  information  is  who  makes  phone 
calls  to  whom  (whether  the  connectivity  is  one  or  zero)  than  the  actual  distance 
between  them. 

There  are  many,  many  other  examples.  What  we  have  learned  from  this  discus¬ 
sion  is  that  when  the  level  of  description  of  a  system  is  increased,  one  sometimes 
needs  to  explicitly  introduce  nonlocality  to  the  modeling.  This  nonlocality  does  not 
violate  the  locality  at  the  lowest  level  description — the  physical  description. 


CELLULAR  AUTOMATA  AS  A  FULLY  DISCRETIZED  AND 
UNIFORM  HIGH-DIMENSIONAL  DYNAMICAL  SYSTEM 

What  is  a  cellular  automaton?  With  so  many  introductory  articles  and  books  exist¬ 
ing  on  this  topic,  I  will  refer  the  reader  to  the  original  publications  (e.g.,  Toffoli  and 
Margolus,^^  and  Wolfram^®’^®).  To  put  it  into  simple  terms,  one  can  say  that  cel¬ 
lular  automata  are  high-dimensional,  fully  discretized,  synchronous,  uniform,  and 
locally  coupled  dynamicaHystems.  There  are  high-dimensional  dynamical  systems 
that  are  not  fully  discretized,  such  as  partial  differential  equations,  coupled  differ¬ 
ential  equations,  and  lattice  maps  (e.g.,  Crutchfield  and  Kaneko^).  There  are  also 
high-dimensional,  fully  discretized  dynamical  systems  that  are  not  synchronous  or 
uniform.  The  model  systems  I  will  introduce  are  high-dimensional,  fully  discretized, 
synchronous,  and  uniform  dynamical  systems  with  nonlocal  connections.^ 

There  exist  other  names  that  can  describe  nonlocally  coupled,  high-dimensional 
dynamical  systems;  for  example,  automata  networks,  or  simply,  networks.  I  will  use 
the  name  “nonlocal  cellular  automata”  to  have  a  closer  reference  to  the  locally  cou¬ 
pled  cellular  automata,  in  order  to  emphasize  the  uniformity  and  synchronousness 
of  the  system. 

Suppose  the  state  value  of  the  component  i  at  time  t  is  xj ,  and  the  total  number 
of  components  in  the  system  is  N,  then  the  state  configuration  of  the  system  consists 
of  state  value  for  each  component:  {x\,X2,  An  n-input  nonlocal  cellular 

automaton  is  defined  by  the  rule  /(.): 


- 


=  /(^ 


jl(»)’*j3(0’  ■  ■  ■ 


(1) 


which  says  that  each  component  i  updates  its  state  value  by  checking  the  state 
values  of  n  other  components,  which  have  indexes  Ji(0- J2(0>  ■  •  •  >  Jn(0-  Knowing 
the  state  values  of  these  components,  and  knowing  the  rule  /(.)  which  is  written  as 
a  rule  table  (a  list  of  all  possible  n-component  configurations  as  well  as  which  state 
vaJue  they  lead  to),  we  are  able  to  determine  what  the  state  value  of  the  component 
i  is  at  the  next  time  step  (xj'*'^). 
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There  are  other  types  of  networks  previously  studied.  One  of  them,  studied 
by  Walker  and  Ashby,  might  be  called  “Ashby  nets,”*^’^'*  is  also  uniform  and  syn¬ 
chronous,  but  not  all  inputs  are  randomly  chosen — one  input  is  always  the  compo¬ 
nent  itself.  More  about  Ashby  nets  will  be  discussed  in  the  next  section. 

Another  type  of  nets,  that  might  be  called  the  “Kauffman  nets,”  is  studied 
in  Kauffman. These  nets  are  synchronous,  but  not  uniform:  the  rule  acting  on 
one  component  may  differ  from  that  on  another  component.  It  is  well  known  that 
different  rules  can  lead  to  different  dynamical  behaviors,  so  mixing  all  of  them 
into  one  system  leads  to  rather  poor  statistics.  If  the  number  of  inputs  (n)  cind 
the  number  of  components  (N)  are  fixed,  and  we  ask  the  question  of  what  the 
“typical”  transient  and  cycle  times  are,  there  will  be  no  “good”  answer.  The  median 
value  (see,  for  example.  Press  et  al.^^  for  a  definition  of  the  median  as  well  as  the 
mean  value)  of  a  wide-spreading  distribution  of  these  statistical  quantities,  as  used 
in  Kauffman,^  may  not  give  a  true  “typical”  value.  Numerical  results  show  that 
median  cycle  lengths  for  these  nets  are  quite  different  from  the  mecin  cycle  lengths, 
though  I  will  not  include  these  results  nor  discuss  this  type  of  net  further  here. 


WIRING  DIAGRAM 

Besides  the  dynamical  rule,  the  wiring  diagram  of  a  network  also  plays  an  important 
role  in  determining  the  dynamics.  It  could  happen  that  with  the  same  rule,  some 
wiring  diagrams  lead  to  one  type  of  dynamics,  while  other  wiring  diagrams  lead  to 
another.  When  we  talk  about  dynamics  of  a  nonlocal  cellular  automaton  rule,  there 
is  an  implication  that  almost  all  wiring  diagrams  (“typical”  or  randomly  chosen) 
lead  to  the  same  dynamics. 

It  is  in  an  analogous  situation  to  local  cellular  automata.  For  local  cellular 
automata,  we  also  talk  freely  about  the  dynamics  of  a  rule,  without  specifically 
mentioning  the  initial  configuration.  It  is  again  implied  that  almost  all  typical  or 
randomly  chosen  initial  configurations  lead  to  the  same  dynamics.  This  idea  is 
essential  to  the  concept  of  “attractor” ;  that  is,  whatever  the  initial  conditions,  they 
are  all  attracted  to  the  same  limiting  behavior. 

The  wiring  diagram  dictates  where  to  take  inputs  for  each  component.  In  some 
sense,  it  determines  the  direction  of  information  flows.  Obviously,  wiring  diagrams 
with  different  topological  structures  will  transmit  information  in  different  ways,  and 
dynamical  behaviors  can  also  be  different. 

For  example,  if  one  assumes  that  for  each  component  i,  one  of  its  inputs  is 
always  itself: 

for  all  i’s  ji{i)  =  i,  but  other  jfc(t)’s  are  random  (ib  =  2, 3,  •  ■  • ,  n),  (2) 

then  the  wiring  diagram  will  not  be  completely  random.  I  called  this  type  of  wiring 
diagram  partially  local  or  partially  nonlocal J 
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Some  nonlocal  cellular  automata  with  partially  local  wiring  diagrams  were  stud¬ 
ied  in  Walker  and  Ashby.^^’^"*  These  are  3-input,  2-state,  nonlocal  cellular  automata 
with  the  second  input  being  the  component  itself 

for  all  i’s,  j2{i)  —  i,  but  ji{i)  and  j^ii)  are  random.  (3) 

It  has  been  shown  that  for  many  3-input,  2-state  rules,  fully  nonlocal  wiring  dia¬ 
grams  and  partially  local  wiring  diagrams  lead  to  different  dynamics.^ 

Another  issue  related  to  the  wiring  diagram  is  the  discussion  on  how  dynam¬ 
ics  are  affected  by  changing  the  number  of  inputs.  It  is  numerically  shown  that 
the  number  of  inputs  is  important  to  determine  the  dynamical  behavior.®  If  one 
randomly  picks  a  rule,  the  more  inputs  one  has,  the  more  likely  the  dynamics  are 
chaotic.  For  local  cellular  automata,  the  increase  of  the  number  of  inputs  will  in¬ 
crease  the  percentage  of  rules  that  are  chaotic.  More  detailed  discussions  are  in  Li, 
Packard,  and  Langton.® 

Now  back  to  the  discussion  of  nonlocal  cellular  automata.  Even  though  each 
component  is  supposed  to  receive  n  inputs,  a  particuleur  realization  of  the  random 
number  generator  may  actually  assign  two  inputs  to  be  the  same.  If  this  happens, 
the  rule  as  applied  to  that  particular  component  will  have  one  less  number  of  input 
than  it  should  have.  And  if  many  other  components  also  have  this  degeneracy  of 
inputs,  it  is  more  likely  that  the  resulting  dyneimics  acts  as  if  the  number  of  inputs  is 
smaller.  Some  specific  examples  of  the  difference  between  the  degeneracy-permitted 
and  distinct-input  diagrams  are  presented  in  Li.^ 


ANALYTIC  APPROXIMATION  OF  MACROSCOPIC  DYNAMICS 

The  ultimate  method  to  study  a  dynamical  system  is  to  run  the  time  evolution 
following  the  rule  that  updates  the  state  value  for  each  component.  The  simulation 
for  3-input,  2-state,  nonlocal  cellular  automata  heis  been  carried  out  and  the  results 
are  summarized  in  Li.^ 

If  one  is  only  interested  in  dyneimics  of  macroscopic  quantities,  for  example, 
the  density  of  state  1 ,  some  alternative  dynamical  equations  for  that  macroscopic 
quantity  can  be  derived.  These  dynamical  equations  for  macroscopic  quantities  are 
not  equivalent  to  the  original  dynamics  rules,  but  they  will,  in  many  cases,  provide 
vaJuable  information  to  the  original  dynamics. 

The  dynamical  equation  for  the  density  of  state  1  can  be  called  return  map: 

d‘+’  =  F(<i‘)  (4) 

where  cP  is  the  density  of  state  1  at  time  t,  and  the  F{.)  is  determined  either 
by  actually  running  the  rule  /(.)  or  by  some  approximation  schemes.  Note  that 
different  original  rules  /(.)’s  can  give  the  same  macroscopic  dynamics  F(.). 
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One  approximation  scheme  called  mean-field  theory  assumes  that  all  inputs  are 
independent  of  each  other,  and  the  probability  for  having  state  1  when  the  n  inputs 
contain  m  state  1  and  n  —  m  state  0  is  estimated  by  counting  the  percentage  of 
input  configurations  (containing  m  state  1  and  n  —  m  state  0)  that  are  mapped  to 
state  1.  For  a  general  introduction,  see  Gutowitz.'* 

To  illustrate  this  approximation  scheme,  let  me  use  the  following  rule  as  an 
example  (the  triplet  is  the  value  of  the  three  inputs,  and  the  number  below  the 
triplet  is  the  value  to  be  updated  to): 

111  110  101  100  on  010  001  ooo 

1  0  1  1  1  0  0  0 

When  all  three  inputs  are  1,  the  state  value  will  be  1;  when  two  inputs  are  1  and 
one  input  is  0,  two  out  of  three  configurations  will  be  mapped  to  1;  when  one  input 
is  1  and  two  inputs  are  0,  one  out  of  three  configurations  will  be  mapped  to  1;  and 
when  all  three  inputs  are  0,  the  state  value  will  never  be  1.  It  is  easy  to  show  that 
one  can  approximate  the  return  map  by 

d'+i  ^  {d^f  +  2(d‘)2(l  -  (i‘)  +  <i‘(l  -  (6) 

Simple  manipulation  shows  that  it  leads  to 

d‘-^^  =  d‘;  (7) 

that  is,  the  density  of  state  1  does  not  change  with  time. 

In  fact,  some  important  information  can  be  extracted  from  this  approximation 
of  return  maps.  If  the  return  map  has  a  stable  fixed-point  solution  equal  to  zero, 
the  limiting  density  of  state  1  should  be  zero  or  very  low.  That  is  the  case  when  the 
original  system  has  a  fixed-point  dynamics  with  zero-density  or  low-density  spatial 
configurations. 

If  the  return  map  has  a  non-zero,  stable,  fixed-point  solution,  there  are  two 
possibilities  for  the  original  dyn2U7iics:  (1)  the  original  system  has  a  fixed-point 
dynamics  with  a  spatial  configuration  about  half  filled  with  Os  and  hadf  with  Is, 
and  (2)  the  original  system  is  chaotic,  with  some  kind  of  “thermal  equilibrium 
states”  being  reached.  Even  though  the  state  value  for  each  component  changes 
constantly,  the  density  of  Is  is  nevertheless  a  constant. 

I  have  yet  to  discover  a  return  map  with  chauDtic  solutions.  Generally  speaking, 
it  is  very  difficult  for  macroscopic  quantities  to  fluctuate  chaotically.  Occasionally, 
numericzd  simulation  shows  that  macroscopic  quantities  such  as  the  density  of  state 
1  do  fluctuate  irregularly.  Nevertheless,  it  is  always  because  these  simulations  are 
carried  out  for  systems  with  finite  sizes.  The  magnitude  of  these  irregular  fluctuation 
decreases  as  the  system  becomes  larger.  And,  in  principle,  they  will  disappear  in 
the  infinite  size  limit.  See,  however,  the  discussions  in  Bohr  et  al.^  and  Keineko.® 
Although  the  return  map  is  not  equivalent  to  the  original  dynamicaJ  rule,  it 
can  provide  valuable  information.  Because  of  the  low  dimensionality  of  the  return 
maps,  it  is  easier  to  study  its  own  “bifurcation”  phenomena  (how  dynamics  of 
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the  return  maps  change  with  the  parameter).  From  these  studies,  one  can  then 
understand  some  aspects  of  the  bifurcation  phenomena  in  the  original  system.  A 
study  of  nonlocal  cellular  automata  rule  space  following  this  strategy  is  carried  out 
in  Li.^  In  particular,  it  is  partially  understood  why  some  bits  in  the  rule  table, 
which  are  called  ‘  hot  bits"  in  Li  and  Packard,^  are  more  important  than  others.  It 
is  because  the  hot  bits  change  the  form  of  the  return  map  more  drastically  than 
other  bits.' 


SYSTEMS  OF  COUPLED  SELECTORS 

The  cellular  automaton  rule  defined  in  Eq.  5  can  be  written  in  another  form; 


... 

33(1) 


(8) 


In  other  words,  if  the  second  input  is  in  state  0,  the  rule  transmits  the  state  value 
from  the  first  input;  if  th.e  second  input  is  in  state  1,  the  rule  transmits  the  value 
from  the  third  input.  This  rule  can  be  called  a  selector,  or  a  multiplexer,  with  the 
second  input  called  a  control  input;  that  is,  it  decides  which  input  to  select. 

This  rule  turns  out  to  be  the  most  intriguing  3-input  2-state  nonlocal  cellular 
automata.  A  typical  spatial-temporal  pattern  for  this  rule  is  shown  in  Figure  1. 
Although  the  limiting  dynamics  is  periodic,  the  transient  dynamics  looks  chaotic. 
This  combination  of  long  chaotic  transients  and  simple  limiting  dynamics  is  typical 
for  systems  on  the  ‘‘edge  of  chaos.” 

From  Figure  1,  we  can  see  some  dark  and  light  horizontaJ  stripes.  In  order 
for  the  dark  stripe  to  form,  if  one  component  hau:  state  1,  other  components  also 
tend  to  have  state  1  so  that  the  total  number  of  components  with  state  1  is  in¬ 
creased.  This  seemingly  simple  fact  implies  the  existence  of  certain  cooperation 
among  components.  Indeed,  other  components  do  not  have  reasons  to  follow  suit 
when  one  component  switches  from'^O  to  1,  unless  they  are  dragged  into  doing  so. 
The  emergence  of  higher  level  structures  is  also  a  hallmark  of  the  edge-of-chaos 
systems. 

The  transient  time  for  systems  of  coupled  selectors  is  observed  to  increase  with 
the  size  of  the  system.  More  careful  simulation  shows  that  the  increase  lo  almost 
linear: 

nv{N)^N  (9) 

where  TaviN)  is  the  mean  transient  time  for  systems  with  size  N.  If  we  exclude  all 
degeneracies  in  choosing  inputs,  it  has  been  observed  that  the  increase  of  transient 
time  is  more  than  linear^: 


TaviN)  \ 


(10) 


lime  0  -•  7* 


FIGURE  1  A  spatial-temporal  pattern  of  the  coupled  selectors.  The  configuration  of 
the  system  (horizontal  string)  consists  either  1  (black)  or  0  (white).  And  the  updating  of 
the  configuration  is  represented  by  showing  the  configuration  at  each  time  step  (time  is 
increased  going  down). 


There  are  many  open  issues  concerning  the  transient  behavior  and  I  will  not 
discuss  them  in  length  here.  Briefly,  there  are  questions  on  how  large  the  system 
size  should  be  in  order  to  trust  the  scaling;  what  the  distribution  of  transient  times 
is  for  a  fixed  size;  whether  this  distribution  with  respect  to  wiring  sampling  is 
different  from  that  of  initial  configuration  sampling;  whether  the  mean  transient 
time  is  a  better  quantity  than  the  median  transient  time,  and  whether  one  should 
take  the  logarithm  first,  then  do  the  average;  how  the  permission  of  degenerate 
inputs  change  the  result;  and  so  on. 

In  Figure  1,  the  limiting  configuration  has  a  very  low  density  of  state  1.  If  we 
change  the  wiring  and  initial  condition,  and  run  the  simulation  again,  it  is  possible 
that  the  limiting  configuration  can  have  a  very  high  instead  of  a  very  low  density 
of  state  1.  These  two  types  of  configurations  are  called  consensus  states.  It  is  not 
clear  before  finishing  the  simulation  which  consensus  state  will  be  reached.  In  fact, 
it  has  been  observed  that  the  density  could  wander  up  and  down  in  such  a  way  that 
the  system  almost  hits  the  high-density  consensus  state  before  turning  the  trend  to 
eventually  reach  the  low-density  one! 


GROUP  MEETING  PROBLEMS 

Imagine  a  group  of  people  having  a  meeting.  The  goal  of  the  .neeting  is  to  find  a 
consensus  opinion:  either  most  of  the  people  vote  yes,  or  most  of  them  vote  no.  Re¬ 
quiring  a  100  percent  yes-vote  or  no-vote  may  not  be  realistic,  so  some  compromise 
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is  made:  we  allow  the  meeting  to  finish  whenever  the  density  of  yes  or  no  is  higher 
or  lower  than  a  certain  threshold  value. 

The  system  of  coupled  selectors  discussed  in  the  last  section  can  be  recast  into 
a  group  meeting  problem.  At  the  beginning  of  the  meeting,  each  person  votes  yes 
or  no  randomly.  Then  each  person  chooses  three  other  persons  (he  can  also  choose 
himself)  as  his  or  her  "inputs.  ’  Each  one  of  the  three  inputs  is  labeled  as  either  the 
first,  the  second,  or  the  third  input.  The  second  person  is  most  important:  whenever 
he  or  she  votes  no.  the  first  person's  vote  will  be  followed:  and  whenever  he  or  she 
votes  yes,  the  third  person  s  vote  is  followed. 

This  somehow  bizarre  way  for  a  group  meeting  to  proceed  is,  nevertheless,  not 
as  trivial  ;is  one  might  have  thought.  First  of  all,  will  the  group  meeting  ever  reach 
a  consensus'.^  By  the  result  presented  in  the  last  section,  the  answer  is  yes.  But 
this  is  true  only  when  the  three  inputs  are  random  chosen.  There  are  examples 
where  a  consensus  is  never  reached.  For  example,  if  the  wiring  is  partially  local, 
i.e.,  jnii)  =  i-  it  is  almost  always  the  case  that  the  dynamics  is  chaotic  and  the 
density  of  state  1  is  around  0.5. ' 

Second,  even  if  a  consensus  is  reached,  do  we  know  which  one? 

For  the  system  of  coupled  selectors,  we  do  not  know  beforehand  whether  it  is 
an  all-yes  or  all-no  state  that  is  reached.  Both  low-  and  high-density  configurations 
are  “traps”  or  “attractors?  of  the  dynamics.  If  we  consider  the  fluctuation  of  the 
density  as  a  random  walk  (though  it  is  a  deterministic  random  walk  because  initial 
configuration,  wiring  diagram,  and  dynamical  rule  are  fixed  during  the  updating),  it 
has  a  50-50  chance  to  reach  either  the  low-density  or  the  high-density  configuration. 

The  system  of  coupled  selectors  is  not  the  only  system  to  have  two  consensus 
states.  Actually,  there  exists  a  large  class  of  “unbiased”  rules  that  behave  similarly 
(by  “unbiased,”  1  mean  that  the  rule  does  not  have  any  reason  to  prefer  either  one 
of  the  consensus  states).  Interestingly,  one  such  rule,  a  7-input  2-state  local  cellular 
automaton,  called  Gacs-Kurdyumov-Levin  rule,  was  proposed  more  than  ten  years 
ago,^  defined  as  the  following: 

1  +  1  _  /  majority  among  x',  x‘_i,  and  x‘_3  if  x|  =  0;  .  . 

‘  ~  \  majority  among  x',  x-^j,  and  x‘^3  if  x|  =  1.  ' 

It  has  been  shown  that  (lacs-Kurdyumov-Levin  rule  (11)  has  two  attractors: 
all-zero  and  all-one  configurations.  The  all-zero  consensus  state  will  be  reached  if 
the  initial  density  of  state  1  is  smaller  than  0.5;  and  the  all-one  consensus  state 
will  be  reached  if  the  initial  density  is  larger  than  0.5  (this  result  is  proved  for 
Gacs-Kurdyumov-Levin  rule  (11)  in  the  infinite  size  limit). 

Similar  to  the  systems  of  coupled  selectors,  consensus  states  may  not  be  reached 
for  Gacs-Kurdyumov-Levin  rule  if  the  wiring  diagram  is  modified.  For  example,  if 
the  majority  is  chosen  among  i|,  x,-_i,  and  i-.j  when  xj  =  0,  and  among  x‘,  x|^j, 
and  xj^2  when  x,-  =  1,  then  the  limiting  density  will  be  more  or  less  the  same  with 
the  initial  density.  So,  if  the  initial  configuration  is  random,  the  limiting  density  will 
be  around  0.5  instead  of  0  or  1.  We  can  see  easily  that  the  Gacs-Kurdyumov-Levin 
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rule  can  also  be  translated  to  a  group  meeting  problem.  If  the  periodic  boundary 
condition  is  used,  we  are  going  to  have  a  “roundtable  group  meeting”! 

With  limited  space  and  time,  I  can  only  introduce  a  few  topics  on  nonlocal 
cellular  automata.  There  are  other  major  topics  that  are  not  covered  here,  for 
example,  viewing  nonlocal  cellular  automata  as  computers,  and  the  structure  of 
nonlocal  cellular  automata  rule  space.  For  interested  readers,  see  my  paper  for  more 
details.^  I  hope  I  have  conveyed  to  readers  the  richness  of  dynamical  behaviors  for 
dynamical  systems  with  nonlocal  interaction,  and  I  hope  more  people  will  share  my 
excitement  in  studying  these  systems. 
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Reality  Kisses  the  Neck  of  Speculation: 

A  Report  From  the  NKC  Workgroup _ 


During  the  1991  Complex  Systems  Summer  School.  Stuart  Kauffman  lec¬ 
tured  on  the  family  of  NKC  models.  We  found  these  ideas  intriguing  and 
formed  an  NKC  study  group.  This  contribution  to  the  lecture  volume  sum¬ 
marizes  some  of  the  ideas  and  musings  of  this  group. 

It  has  been  suggested^®  that  NKC  is  an  acronym  for  “no  known  content.  " 
Whether  or  not  this  is  the  case,  NKC  models  are  (also)  named  for  the 
three  important  parameters  of  the  discrete  fitness  landscape  models  that 
zu'e  discussed  in  this  paper. 

This  summary  has  three  distinct  parts.  The  first  is  a  mathematically  formal 
description  of  NKC  models.  The  second  is  a  list  of  critiques  of  current  uses 
of  NKC  models.  The  third  section  suggests  several  new  areas  that  NKC 
models  may  be  useful.  Space  is  limited  so  we  will  have  to  ask  our  readers 
to  see  other  authors’  treatments  of  the  basic  NKC  model.®'^®  *^'^^’^^ 


1991  Lectures  in  Complex  Systems.  SFI  Studies  in  the  Sciences  of  Complexity, 
Lect.  Vol.  IV,  Eds.  L.  Nadel  &  D.  Stein,  Addison-Wesley,  1992 
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1.  A  MATHEMATICAL  FORMALISM 

The  formalism  presented  below  is  an  attempt  to  add  rigor  to  the  NKC  model.  It 
does  this  in  two  ways:  First,  by  introducing  a  mathematiccd  formalism  that  can  be 
used  to  find  analytical  results,  and  second,  by  specifying  the  model  to  a  degree  which 
allows  for  more  specific  critiques  of  the  underlying  assumptions.  This  formalism  is 
based  on  our  reading  of  the  model  as  presented  by  Stuart  Kauffman.^' We  believe 
that  this  is  the  first  formalization  of  this  kind,  though  several  other  approaches 
exist. 

In  the  description  of  the  formalism,  we  will  be  applying  the  NKC  model  to 
genotypes.  This  allows  us  to  use  a  familiar  vocabulary.  It  is  important  to  remind 
you  that  there  is  nothing  inherent  in  this  formalism  that  restricts  us  to  this  level 
of  biological  organization.  It  will  also  be  immediately  obvious  that  we  are  using 
caricature  genotypes.  They  are  simply  binary  strings;  we  will  have  more  to  say  on 
the  relationship  between  binary  strings  and  biological  genes  in  the  second  section 
of  the  paper. 

One  of  the  goals  of  this  formalization  is  to  derive  analytical  results  for  the  NK 
family  of  models.  Another  is  to  formaJize  the  operators  that  define  the  neighborhood 
of  a  given  gene:  one-mutant  neighbor,  inversion,  and  crossing  over.  These  operators 
are  defined  below. 


1.1  DEFINITIONS 

Let  AT  be  a  positive  integer  and  K  be  a  non-negative  integer.  N  denotes  the  number 
of  genes  in  the  genotype  of  an  organism  (see  Figure  1),  and  K  denotes  the  number 
of  other  genes  (see  Figure  2)  which  depend  on  the  fitness  contribution  (which  will 
be  specified  later)  of  each  gene,  where  0  <  K  <  N  —  1.  Thus,  K  measures  the 
richness  of  epistatic  interactions  among  genes.  Xn  =  {0, 1}'^  corresponds  to  the 
configuration  space  of  the  genotype  with  N  genes  and  Xk+i  =  is  the 

collection  of  the  K"  -b  1  genes  on  which  the  fitness  contribution  of  each  gene  bears. 

In  a  coevolutionary  system,  a  positive  integer  5  denotes  the  number  of  species 
(see  Figure  3).  Here  we  are  using  fhe  binary  string  to  represent  the  genes  in  a 
species,  not  the  bases  in  a  gene  or  amino  acids  in  a  protein.  The  genotype  of 
each  species  is  represented  as  a  binary  string,  where  each  element  stauids  for  a  gene. 


FIGURE  1  An  example  of  a  genotype  (binary  string)  of  length  N. 
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FIGURE  2  An  example  of  several  epistatic  interactions;  here  K  —  4. 


FIGURE  3  A  coevolutionary  system,  where  positive  integer  5  denotes  the  number  of 
species. 


However,  the  model  could  be  used  at  those  levels.  Often  it  is  easier  to  visualize 
the  relations  if  we  think  of  the  binary  strings  as  analogous  to  DNA  sequences. 
But  Kauffman  and  Johnson’s  coevolution  model  describes  the  genes  as  being  the 
elements  of  the  binary  string.*^  If  this  seems  confusing,  this  arises  from  the  fact 
that  their  descriptions  move  easily*  between  different  levels  of  organization.  This 
shift  may  be  unfortunate,  but  it  illustrates  the  power  of  this  model  at  different 
levels  of  organization.  Remember  that  instead  of  species  we  could  think  of  these  as 
being  chromosomes  interacting  or  different  genes  on  a  chromosome  interacting. 

Each  gene  in  the  fth  species  depends  on  K  genes  internally  and  on  C  genes  in 
each  of  the  S,  (€  {1, . . . ,  S})  species,  S  is  the  total  number  of  species.  5,  is  a  subset 
of  S  and  represents  all  other  species  with  which  species  i  interacts.  That  is  to  say, 
a  positive  integer  C  is  the  number  of  other  genes  in  other  species  which  depend 
on  the  fitness  contribution  of  each  gene  and  Si  is  the  number  of  other  species  with 
which  the  ith  species  interacts.  Let  XcxS,  =  {0,1}^’^'^'  denote  the  configuration 
space  of  the  C  genes  in  other  species  on  which  the  fitness  contribution  of  each  gene 
bears. 
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We  now  introduce  random  variables  to  describe  the  elements  of  these  interacting 
bin^Lry  strings. 

For  each  j  G  {1, . . . , Xn)  is  the  genotype  of  the  ith  species  with  N, 
genes,  i.e., 


rjU)  =  (,;0)(1),  ,7(»(2), . . . ,  6  X  with  =  {0, 1}  . 

We  illustrate  this  in  Figure  4. 

In  order  to  describe  the  interactions  between  elements  of  the  string  (i.e.,  inter¬ 
action  between  genes  within  a  chromosome),  we  define; 

a,o»(a:)  =  {4«,4«., ,..«>) 

with  iQ^  =  i  and  4^^  ^  if  r  ^  5 . 

Here  A^^\K)  denotes  the  collection  of  indices  which  affect  the  fitness  contri¬ 
butions  of  the  tth  gene  in  the  jth  species.  And 


;  €  A!«(/f))  =  (.,('■>(•4’'), .  •  •  e  Xk+i 

is  the  configuration  which  bea^s  on  the  fitness  contributions  of  the  ith  gene  in  the 
jth  species  (see  Figure  5). 

Similarly 


AJ)  AJ)  AJ)  ' 

■*2,5>  *3.5>  •  •  •  ’  *C.5 


Ap'(C)=|t/s=5 . 

and  (,<»(/):  /  €  A«)(C))  =  •  •  ■ . 


JJ<»= 

17(2)= 

Tj0')= 


nt 

•  •• 

-  Vn 

Vl 

vi 

Vn 

•  •  • 

4 

•  •  • 

•  •  • 

rJl 

Vi 

•  •  • 

Vn 

FIGURE  4  This  diagram 
depicts  several  binary 
strings.  Each  of  these  strings 
represents  a  genotype. 
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Where  N  =  7  and  K  —  4,  we  would  then  write  this  as: 

^0),A3(4)  ^  (;;0)(i),^0)(3),^0)(4),;,a)(6),;;0)(7))  . 

1.1.1  THE  OPERATORS  There  cire  several  operators  that  define  which  strings  are 
accessible  from  a  given  'string.  These  define  the  neighborhood  of  a  given  string. 
These  operators  are  named  “mutation,”  “crossover,”  and  “inversion.”  These  terms 
take  their  inspiration  from  looking  at  the  NK  model  as  describing  DNA.  This  is  a 
different  organizational  level  from  the  one  we  have  been  using  above. 

New  variations  in  the  genotype  can  be  introduced  by  mutation,  crossing  over, 
and  inversion.  These  operations  define  searchable  neighborhoods. 

1.  One-mutant  neighbor  operator: 

:  Nj  — >■  Nj ,  where  m  6  { 1 , . . . ,  } 

is  defined  by  a  change  in  state  of  the  individual  gene  (0  — ►  I  or  1  — ►  0)  within 
the  genotype.  If  j  ^  m  (there  is  no  change  in  the  state  of  the  gene), 

if  i  =  m  (invert  the  state  of  the  mth  gene), 

VmHi)  =  1  -  'Imii)  ■ 


Note  that  operation  simply  flips  the  bit  position  m. 

2.  Crossover  operator: 

=  (,W.>(1) . ,«')(a),>,(>'>(a+  1), . . . , 
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The  other  product  of  this  crossing  over  is; 

The  point  that  the  two  genes  are  broken  is  a  and  we  get  two  new  genotypes  by 
rejoining  the  substrings.  In  our  description  of  binary  strings  as  being  genotypes 
belonging  to  different  species  it  is  not  obvious  how  the  crossing-over  operator 
would  be  relevant.  But  as  mentioned  above  the  NKC  model  has  different  levels 
of  applicability.  In  this  case  thinking  of  the  binary  strings  as  being  analogs  for 
bases  in  a  gene  on  a  chromosome  would  be  more  helpful. 

3.  Inversion  operator: 

:  Nj  —  Nj ,  where  a  <  b  and  a, 66  {l....,.Vj} 

is  defined  by: 

When  a  =  b,  then  the  inversion  operators  reduces  to  the  identity  operator.  This 
operator  is  applied  to  a  single  string.  Breaking  and  rejoining  occurs  immediately 
after  the  point  a  and  6  and  the  string  of  genes  between  points  a  and  b  gets 
inverted. 

Furthermore,  we  can  define  the  following  subspaces  of  X  corresponding  to  the 
above-mentioned  operators. 

1.  One-mutant  neighbor  space: 

=  . JV,) 

2.  Crossover  space: 

3.  Inversion  space: 

X, <  6,a,6  G  {1,  -  -  • ,  iV,  }} 
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1.1.2  FITNESS  The  fitness  contribution  of  the  zth  and  jth  genome  (species)  is  a 
combination  of  all  of  the  interaction  internal  to  the  genotype  j  and  the  interactions 
with  genotypes  of  each  of  the  other  species,  Sj .  The  fitness  function  assigns  a  value 
of  fitness  contribution  to  each  state  of  the  set  of  genes  that  influence  a  given  gene. 
This  may  become  a  little  convoluted  so  we  have  included  some  specific  examples 
below. 

We  now  define  the  fitness  contribution  of  the  I'th  gene  and  jth  species: 

1 

This  is  a  combination  of  fitness  contributions  from  internal  interactions  (as  a  func¬ 
tion  of  A')  and  interspecies  interactions  (as  a  function  of  C).  Where 

=  W.(7?(^)(p):pe  aS'^(A')) 

-  ft  =  i,...,yv 

Wi(r/^)-.  A;.'\A')),  aS'HC)):  0  =  1 . S 

l/=l....,5,- 

ir)^^Hp):peA\^\K))€XK^, 

{ri^‘Hq):qeA^‘\C))eXc 

TZ  is  the  collection  of  (0,l)-valued  random  variables.  In  general,  the  cardinality 
of  7^,  |7v|,  is  a  very  large  number.  Commonly,  for  the  sake  of  simple  analysis,  we 
may  assume  that  1Z  is  the  set  of  independent,  identiceilly  distributed  (IID)  random 
variables. 

The  fitness  of  the  jth  species  is  defined  as  the  average  fitness  contribution  of 
each  gene: 

1=1 

Hereafter,  the  above-mentioned  NK  family  model  will  be  called  the  M  = 
{N,  K,C,S,Si)  model.  Sometimes  it  is  called  the  NKC  model  for  short. 


1.2  EXAMPLES  OF  THE  M  =  {N,  A ,  C,  5, 5/)  MODEL 

This  section  gives  some  examples  of  the  NK  family  of  models.  We  do  this  to  help 
the  reader  tether  some  of  the  formalism  to  more  concrete  situations. 
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1.2.1  THE  BASIC  NK  MODEL:  M(N,  K,  0, 1, 0)  .In  this  model,  the  number  of  species 
is  one,  so  we  will  omit  the  superscript  (j)  so  that: 

n  =  Wi(T))  =  , 


and  so  on. 

By  definitions  (1.1)  and  (1.2),  we  have 

1  ^ 
i=l 

Furthermore,  it  is  easily  obtained  that  for  m  =  1, . . . , 

Wi.n,n)-W(,,)=j  53  (1.4) 

i:meA,(K) 

Next,  we  will  consider  this  model  in  more  detail. 

CASE  A:  K  =  0  Equations  (1.3)  and  (1.4)  imply  that  for  m  =  1, . . . ,  A^, 

1  ^ 
t=l 

-  »7(m))  -  iy^(77(m))} 


(1.5) 

(1.6) 


where  fVi(Tj(i))  =  M^.((»7(r))) 


TABLE  1  In  this  table  we  calculate  all  of  the  fitnesses  for  all  of  the  possible 
strings  for  the  case  where  N  =  3,  K  =  0,  C  =  0, 5  =  1,  and  Si  =  0. 


rj 

Wi 

W2 

W3 

Wr, 

000 

Wi{0) 

W2i0) 

V^3(0) 

Hw^i(O)  +  M^2(0)  +  1^3(0)} 

001 

l^i(O) 

W2i0) 

Wsil) 

^{l^l(0)  + 1^2(0) +  1^3(1)} 

010 

l^i(O) 

W2il) 

WsiO) 

HM^i(0)  +  W^2(1)  +  W^3(0)} 

Oil 

1^1  (0) 

W2{1) 

Wail) 

^1^1(0)+  1^2(1) +  1^3(1)} 

100 

w^i(l) 

W2i0) 

1^3(0) 

i{W^l(l)  +  1^2(0) +  1^3(0)} 

101 

Wiil) 

1^2(0) 

1^3(1) 

i{V^l(l)  + 1^2(0) +  M^3(1)} 

no 

Wiil) 

1^2(1) 

1^3(0) 

i{l^l(l)+W^2(l)  +  1^3(0)} 

111 

Wx{l) 

W2{1) 

Wail) 

1{W',(1)+1V,(1)  +  H'3(1)) 
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For  example,  if  N  =■  S,  K  =  0,m  =  2. 

T)  =  (0,0,0)  and  772  (0, 1,  0) , 

then 

1V((0,1,0))-VV((0,0,0))  =  HV^2(1)- 1^2(0)}.  (1.7) 

u 

For  more  details  refer  to  Table  1. 

CASE  B:  K  =  N  —  i  Similary,  for  m  =  1, . . . ,  .V  we  have 

«=i 
1  V 

Wiri,n)-Wir,)=  -Y.{W,ir^m)-W,iv)},  (1-9) 

i=l 

where 

(y,(;7)  =  VT.(77:A.(.V-l)). 

For  example,  if  N  =  Z,K  =  2,  m  =  2, 77  =  (0,0,0),andr72  =  (0, 1,0),  then 

WHO,  1,0))  -  WHO, 0,0))  =  -  J]{V^.((0. 1,0))  -  1V.((0,0,0))}  .  (I.IO) 

i:z  1 

In  general,  see  Table  2. 

TABLE  2  In  this  table  we  calculate  some  of  the  fitnesses  for  all  of  the  possible  strings  for 
the  case:  N  =  3,  K  =  N  —  I,  C  =  0,S  =  I,  and  Si  =  0. 

T]  Wi  W2  W3  Wr, 


000  VTi(0,0,0)  1^2(0, 0,0)  IV2(0,0,0)  i{lTi(0,0,0)  +  V^2(0, 0,0) +  1^3(0, 0.0)} 

001  lVi(0,0,l)  V^2(0,0,l)  1^2(0, 0,1)  + ^2(0, 0,1) +  1^3(0. 0.1)} 

111  w^i(l,l,l)  1^2(1, 1,1)  W"2(l,l,l)  5{^i(1-1’1)  +  ^2(1,1,1)  +  V^3(1.1,1)} 
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CASE  C:  0  <  A'  <  ^  —  1  (THE  GENERAL  CASE)  This  case  is  more  complicated  thaji 
the  previous  K  =  0  and  K  =  N  —  1  cases.  For  example,  assume  that 


Ai(A')  =  {i  =  ...,i  +  r} 

with  /  >  0,  r  >  0  and  I  +  r  —  A,  and  we  adopt  periodic  boundary  condition.  Then 
for  m  =  1 , . . .  iV,  we  obtain 

,  N+l 

«=N-r 
,  AT+l 

=  ~  Y,  (VV((.j„:A,(A-)) Ai(A'))}.  (1.12) 

t=iV  — r 

For  example,  li  N  =  5,  K  =  2,(1  =  r  =  l),m  =  2,t;  =  (0,0, 0,0),  zind  r/s  = 
(0,0,0, 1),  then 

-  mn)  =^[{1^4((0,0, 1))  -  tv^4((0,0,0))} 

+  {W,({0, 1,0))  -  1^5((0,0,0))} 

+  {V^i((l,0,0))-iyi((0,0,0))}. 


In  the  above  cases.  A, (A')  is  chosen  deterministically.  In  particular,  if  K  = 
2r(/  =  r),  then  Ai(A')  is  a  symmetrically  chosen  epistatic  set. 

On  the  other  hand,  we  can  consider  the  case: 


A,(A)  =  Ai(A)\{i}(={i,,...,i4)) 

Ai(lV)  =  {l . Af)\{.), 

where  A\B  =  {x:  x  G  >1  fl  5^}  for  sets  A  euid  B.  That  is  the  set  of  points  that 
belong  to  A  but  not  to  B.  (This  is  sometimes  called  the  difference.) 

In  this  case,  Ai(A)  is  a  randomly  chosen  epistatic  set,  i.e.,  there  is  a  random 
variable  Y  such  that 


P\Y  =  A, (AT)]  =  >  0  and  ^  =  1 . 

Notice  that  the  random  cases  of  A'  =  0  and  K  =  N  —  1  coincide  with  deterministic 
ones. 

We  hope  that  this  formalism  and  our  examples  using  it  will  clarify  some  of 
the  descriptions  of  NK  and  NKC  models.  This  formalism  has  already  been  used  to 
achieve  some  analytical  results. 
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2.  COMMENTS  ON  NKC  MODELS 

The  comments  in  this  section  are  a  mixture  of  two  forms.  Some  of  our  comments 
suggest  areas  where  the  NK/NKC  models  seem  to  diverge  from  the  systems  they 
intend  to  model  so  much  that  the  model  should  be  altered;  others  explore  some  of 
the  details  of  the  model  structure. 

As  a  backdrop  for  these  comments,  it  is  useful  to  review’  some  of  the  stated^" 
goals  of  NK  models.  Kauffman  and  Weinberger  listed  the  following  issues: 

1.  How  many  local  optima  exist  in  a  landscape? 

2.  What  is  the  distribution  of  optima  in  the  landscape?  Are  they  near  one  another 
in  special  subregions  of  the  space  or  randomly  scattered? 

3.  What  are  the  lengths  of  uphill  walks  to  local  optima? 

4.  As  an  optimum  is  approached,  the  fraiction  of  fitter  neighbors  must  dwindle  to 
0.  How  rapidly  does  the  fraction  of  fitter  mutants  dwindle? 

5.  Because  the  fraction  of  neighbors  which  are  fitter  dwindles  to  0,  there  is  some 
characteristic  relation  between  the  number  of  mutations  “tried”  and  the  number 
“accepted”  on  an  adaptive  walk.  How  are  the  two  related? 

6.  How  many  alternative  optima  are  accessible  from  a  given  steirting  point?  Caui 
a  “low-fit”  peptide  typically  climb  to  all  possible  local  optima,  or  only  a  small 
fraction  of  those  optima?  Among  the  accessible  alternative  optima,  how  often 
will  each  be  “hit”  on  independent  adaptive  walks  from  the  same  starting  point? 

7.  How  many  of  the  possible  peptides  can  climb  to  any  specific  optimum,  including 
the  global  optimum?  A  small  fraction?  Almost  all? 

8.  Since  most  adaptive  walks  end  on  local  optima,  what  are  the  fitnesses  of  such 
optima  and  how  do  they  compare  with  the  global  optimum  in  the  space? 

9.  The  one-mutant  variants  of  a  local  optimum  must  be  less  fit  than  the  optimum. 
But  do  all  of  the  variants  lead  to  nearly  the  same  loss  of  fitness  or  is  there  high 
variance  indicating  precipitous  cliffs  and  gentle  ridges  in  different  directions  in 
the  high-dimensional  space? 

This  list  indicates  some  of  the  initial  goals.  Each  one  of  them  is  an  opportunity 
to  question  the  underlying  assumptions.  We  offer  some  of  those  questions  below. 

2.1  SWARMS  IN  STATE  SPACE 

It  should  be  clear  that  populations  never  settle  down  to  the  kind  of  equilibrium 
that  allows  them  to  be  fairly  represented  as  a  single  point  in  a  high-dimensional 
binary  state  space.  Each  point  in  the  space  is  a  string  of  length  D,  the  dimension  of 
the  space.  Rather  a  population  is  really  a  swarm  across  this  lattice,  and  the  swarm 
is  localized  in  part  of  the  space  with  a  hamming  distance  radius  less  than  some 
e.  (The  hamming  distance  is  a  measure  that  counts  the  number  of  non-identical 
bits  in  two  binary  strings.)  This  approach  allows  us  to  imagine  the  bifurcation  of  a 
swarm  and  even  imagine  a  population  space  interaction  that  excludes  some  strings. 
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There  might  be  lethal  strings  that  also  lie  within  e  and  divide  the  space.  We  should 
like  to  see  future  modeling  of  NKC  models  grapple  with  this. 

The  NKC  model,  in  effect,  assumes  that  the  population  sizes  do  not  change 
(they  are  monotypic  and  genes  get  fixed  after  each  mutation).  A  more  realistic 
assumption  would  be  that  populations  change  as  a  monotonic  function  of  fitness. 
The  importance  of  a  particular  species  to  the  fitness  of  another  species  (through  C 
linking  genes)  should  be  related  to  the  population  of  that  species.  Thus,  if  a  frog’s 
sticky  tongue  at  time  t  drives  flies  to  near  extinctions  at  time  i  +  1,  that  sticky 
tongue  contributes  less  to  the  frog’s  fitness  at  f  +  /  than  at  t. 

It  is  important  that  these  population  dynamics  be  incorporated  into  the  model 
because  it  deals  with  populations  whose  fitness,  and  therefore  sizes,  would  be  fluc¬ 
tuating  dramatically,  and  the  resulting  dynamics  would  tiffect  whether  and  what 
types  of  equilibrium  arise.  It  is  exactly  this  long-term  behavior  that  Kauffman  is 
interested  in  (e.g.,  “frozen  state”  vs.  “near  frozen”  vs.  “liquid  state”). 


2.2  WORD  CHUNKING 

One  of  the  reasons  for  modeling  with  binary  strings  is  the  feeling  that  the  results 
are  easily  portable  to  situations  where  more  than  a  single  bit  is  necessary  at  each 
location  to  describe  the  system.  This  can  easily  be  seen  when  we  apply  the  NK 
model  to  different  biological  levels.  If  the  binary  string  represents  the  four  bases 
in  a  strand  of  DNA,  then  we  need  two-bit  words  to  specify  the  four  bases  (e.g., 
00:adenine,  OTcystosine,  10:thymine,  11:  guanine).  If  we  were  to  be  modeling  the 
amino  acids  in  proteins,  we  would  need  five-bit  words  (This  would  even  leave  a  few 
extra  redundant  or  meaningless  words).  If  we  imagine  that  we  are  looking  at  the 
whole  set  of  proteins,  we  might  need  very  long  words  indeed. 

The  question  then  arises:  which  of  the  manipulations  that  we  apply  to  our 
binary  strings  (mutation,  crossing  over,  inversion)  are  unchanged  if  we  need  to  think 
with  longer  words?  Are  the  conclusions  claimed  for  NK  models  and  especially  NKC 
models  complicated  by  the  need  to  preserve  behavior  for  a  range  of  word  sizes?  We 
are  particularly  concerned  about  the  meaning  of  the  basic  operators  on  different 
length  words.  Inversion  and  crossinj^  over  may  be  deleterious  far  more  often  when 
moe  than  one  bit  is  used  to  represent  a  basic  unit  of  information. 


2.3  CHANGE  IN  N  THROUGH  EVOLUTION 

An  assumption  of  the  NK  models  and  (even  more  unlikely)  in  the  NKC  models 
is  that  there  is  a  single  vdue  N  and  that  it  does  not  change  in  the  course  of  the 
simulation.  N  is  the  length  of  the  binary  string.  The  length  of  the  genome  changes 
in  evolution,  and  different  species  will  not  share  the  same  N. 
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2.4  DISTRIBUTION  OF  A 

Random  distribution  of  fitness  values  do  not  reflect  “realistic”  fitness  landscapes. 
The  use  of  random  assignments  to  model  epistatic  interactions  was  motivated  by 
an  admission  of  profound  ignorance. However,  given  tnat  we  do  not  know  and 
possibly  cannot  know  realistic  values  for  the  fitness  contribution  of  an  allele,  it  is 
also  not  clear  that  a  random  distribution  of  fitness  represents  the  behavior  of  “real” 
systems  well.  It  may  be  that  conditions  defining  biological  systems  are  just  a  small 
subset  of  these  random  assignments.  Conclusions  may  thus  be  bcised  on  an  atypical 
set  of  data.  (It  would  be  worthwhile  examining  the  sensitivity  of  these  conclusions 
to  different  assumptions  about  fitness  landscapes.) 

A  closely  related  question  is  whether  there  are  distributions  that  more  closely 
fit  our  intuition  of  epistatic  interactions.  Clearly,  using  a  single  mean  value  (A',  in 
the  formalism)  to  represent  the  epistatic  interactions  is  a  caricature.  Assuming  a 
distribution  on  the  number  of  interactions  would  be  a  better  approximation.  This 
distribution  would  likely  be  skewed,  so  that  most  genes  are  unlinked  and  a  few 
would  be  highly  linked  (see  Figure  6). 

A  similar  argument  can  be  made  for  the  parameter  C;  however,  it  is  more 
difficult  to  recommend  a  shape  for  the  distribution.  Clearly  some  members  of  a 
community  are  linked  in  many  ways  while  many  members  are  peripherally  linked 
if  they  are  linked  at  all. 

standard  use  of  a  single 


epistatic  interactions 

FIGURE  6  Current  models  use  a  single  value  for  A  and  one  possible  distribution  of 
K  values. 
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FIGURE  7  This  diagram  rd^esents  six  different  two-dimensional  embeddings  (tilings) 
of  neighbors  that  preserve  the  notion  of  neighborhood  for  C  =  2, 3, 4, 5, 6, 8.  We  have 
not  found  one  for  7  yet. 


2.5  WHAT  ARE  THE  VALUES  OF  K  AND  C  FUNCTIONS  OF? 

A  species  faces  adaptive  choices  which  affect  its  interconnectedness  with  the  rest 
of  the  ecosystem.  It  is  therefore  possible  that  a  system,  rather  than  changing  the 
ruggedness  of  its  leindscape  (e.g.,  by  changing  K)  to  suit  its  interconnectedness 
with  the  ecosystem  (C),  would  do  exactly  the  opposite.  More  plausible  is  that  both 
change. 


2.6  REPRESENTING  THE  NEIGHBORHOOD  OF  INTERACTING  SPECIES  ON 
LATTICES 

When  looking  at  islands  of  chaos  in  seas  of  stability, a  rectangular  grid  is  used.  To 
have  the  intuitive  feeling  of  neighborhood  and  also  vary  the  number  of  interacting 
species,  it  is  important  to  use  an  embedding  in  two  dimensions  that  preserves  the 
notion  of  neighbors.  In  Figure  7  we  indicate  some  approaches  to  neighborhoods  on 
planes  that  preserve  our  notion  of  closeness.  (Notice  that  in  the  two-neighbor  case 
a  circle  is  a  natural  way  to  have  periodic  bound2iry  conditions.)  Previous  work  has 


Reality  Kisses  the  Neck  of  Speculatbn 


345 


dealt  with  the  four-neighbor  case  (we  think)  because  it  was  convenient  to  display 
and  discuss  since  drawing  it  spams  the  plane. 

2.7  THE  ROBUSTNESS  AND  EXISTENCE  OF  Kopt 

Kauffman  suggests  that  there  is  an  optimum  A'opt  towards  which  a  species  will 
evolve.  This  conclusion  is  derived  under  the  assumption  that  no  particular  epistatic 
interaction  or  set  of  interactions  is  especially  important.  However,  it  is  plausible 
that  the  epistatic  interaction  of  specific  traits  might  be  much  more  important  than 
a  particular  K  value  (i.e.,  Kopt)-  Thus,  a  successful  large  combination  (much  larger 
than  Kopt)  of  epistatic  interactions  might  overwhelm  the  disadvantages  of  a  large 
K.  Therefore,  it  is  possible  that  K  Kgpt-  It  would  therefore  be  worth  asking  how 
the  force  of  the  “attraction”  towards  Kopt  varies  with  distance  from  K.  A  further 
question  is:  how  it  would  affect  the  dynamics  of  the  system  as  a  whole  to  have 
certain  species  “stuck”  at  K  ^  A'opt? 

It  is  not  correct  to  infer  the  attraction  of  a  Kopt  from  observations  of  higher 
fitness  scores  for  species  which  have  A’s  closer  to  Kopt-  Presumably,  the  amount  of 
epistatic  interactions  within  a  species  is  altered  by  the  development  of  new  traits 
(or  changes  in  old  ones)  which  interact  with  existing  traits  (or  the  disappecirance 
of  old  traits  which  interacted  with  other  traits).  A  change  in  AT  is  not  qualitatively 
equivalent  to  a  single  mutation  in  epistatic  interaction  with  a  single  other  gene.  An 
incremental  “change”  in  K  (i.e.,  -1-1  or  -1)  results  in  every  single  gene  increasing 
its  epistatic  interaction  with  a  single  other  gene.  Further,  this  is  not  a  change  in 
an  evolutionary  sense,  since  a  species  in  the  NK  model  cannot  adapt  by  “changing 
its  A'.”  A  test  of  the  “A'opt  conjecture”  at  the  appropriate  level  of  analysis  would 
operate  through  “epistatic  adaptation,”  one  pair  of  genes  at  a  time. 


2.8  WHAT  OTHER  OPERATIONS  DEFINE  THE  GENOTYPE 
NEIGHBORHOOD? 

The  one-mutant  neighborhood  is  useful  in  that  it  defines  what  local  genotype  space 
can  be  explored.  Evolution  in  the  NK"  model  is  constrained  to  paiss  via  1-mutant  fit¬ 
ter  variants.^’^®’'^’'^’^®  But  there  are  other  mechzinisms,  as  important  as  mutation, 
that  define  the  local  neighborhood.  (Here  the  neighborhood  is  the  set  of  genotypes 
that  can  be  reasonably  explored  in  a  single  generation.) 

The  fraction  of  local  genotype  space  to  be  explored  is  a  function  of  popula¬ 
tion  size,  genome  size,  mechanisms  of  exploration,  chance,  etc.  Since  the  part  of 
the  genotype  space  explored  is  not  necessarily  exhaustive,  the  movement  of  fitter 
neighbors  rather  than  fittest  neighbors  seems  a  bit  more  accurate. 

Mechamisms  of  exploration  allow  us  to  cross  the  discrete  bit  space.  Mutation  is 
one  of  those  mechanisms  and  the  one-mutant  neighborhood  is  the  “local”  space  that 
is  explored  in  the  standard  NK  model.  If  we  combine  inversions  or  diploid  genotypes 
with  crossing  over,  we  can  then  jump  far  across  the  space  in  a  single  step.  So,  in  our 
earlier  discussion  on  formalism,  we  added  the  inversion  and  crossing-over  operators. 
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These  operators  act  as  trap  doors  (worm  holes  if  you  wish)  which  connect  parts  of 
the  space  that  may  be  far  apart  with  respect  to  hamming  distzuice.  Population  size 
acts  as  a  constraint  in  as  much  as  there  are  a  finite  number  of  offspring.  Genome 
size  interacts  with  population  size  in  that  a  huge  genome  has  a  very  large  local 
neighborhood  to  explore  and  if  the  population  size  is  small,  only  a  small  fraction 
of  this  space  will  be  explored. 


2.9  SHOULD  ALL  SPECIES  HAVE  THE  SAME  TIMINGS? 

Genome  size  should  be  inversely  proportional  to  the  number  of  steps  a  population 
can  take  in  a  given  iteration  of  the  model.  Large  genomes  should  take  fewer  steps  and 
so  should  be  able  to  explore  a  smaller  subset  of  the  local  genotype  neighborhood. 
Here  we  see  that  a  natural  complication  of  using  different  N  is  that  we  now  must 
worry  about  relative  rates  of  evolution. 


2.10  OTHER  QUESTIONS 

In  the  spirit  of  Kauffman  and  Weinberger’s  list  (above),  we  will  end  this  discussion 
with  a  list  of  questions  that  will  need  to  be  addressed  in  subsequent  analyses  of 
fitness  landscapes  within  the  NKC  formulation. 

1.  What  is  the  relationship  between  K  and  C? 

2.  Why  is  the  evolution  of  K  more  important  than  evolution  of  C? 

3.  Why  should  the  fitness  contribution  of  K  and  C  have  the  same  magnitude? 
Could  K  and  C  be  in  a  different  “currency?” 

4.  Kow  can  the  fitness  of  interacting  genes  be  reflected  in  C? 

5.  What  is  the  relationship  between  fitness  components  and  the  number  of  genes? 

6.  How  do  we  introduce  the  long  and  convoluted  series  of  events  (development) 
that  it  takes  from  genes  to  traits?  This  is  critical  because  it  is  traits  that 
interact. 

7.  What  is  the  mechanism  of  ecosystems  tuning  themselves  via  changing  5’s  (the 
number  of  species)? 

8.  How  important  is  it  to  model  swarms  across  the  discrete  space  instead  of  just 
populations  as  points? 

9.  Does  it  adfect  the  model  results  if  turns  are  taken  synchronously  or  serially? 

10.  How  does  epistasis  in  NKC  models  correspond  to  the  standard  notions  of 
epistasis? 

11.  How  do  we  reflect  the  intuition  that  different  members  of  an  ecosystem  are  of 
different  complexity? 

12.  At  what  levels  of  organization  are  the  NK/NKC  models  most  appropriate? 
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3.  NKC  MODELS  AND  SOCIAL  SCIENCE 

In  this  third  part  we  want  to  broaden  the  horizon  of  NKC  models  to  the  social 
sciences.  We  examine  the  application  of  NKC  models  to  the  problem  of  coevolving 
complex  strategies,  such  as  firm  diversification,  and  to  coevolving  complex  belief 
systems,  such  as  attitudes  toward  the  government.  By  “coevolving”  we  mean  that 
different  actors  adapt  to  each  others  actions  over  time;  by  “complex”  we  mean  that 
individual  actors  interact  with  each  other  so  that  the  outcome  is  more  than  the  sum 
of  individual  actions.  To  demonstrate  the  usefulness  of  NKC  models  in  the  area  of 
social  sciences,  we  outline  a  few  specific  applications. 

3.1  THE  PROBLEM  OF  COEVOLVING  STRATEGIES 

Evolutionary  models  have  been  used  to  study  the  coevolution  of  strategies  in  the 
social  sciences. These  models  aissume  that  effective  behavior  becomes  more  com¬ 
mon  either  because  of  emulation  of  successful  actors  or  because  of  the  physical 
replacement  of  actors  by  more  successful  innovators.^  Coevolving  strategies  in  NKC 
models  differ  from  the  evolutionary  models  above.  Specifically,  rather  than  assum¬ 
ing  that  unsuccessful  behaviors  are  replaced,  we  assume  that  actors  incrementally 
change  their  behavior  over  time.  In  this  context  NKC  models  would  be  interpreted 
as  follows: 

1.  Each  actor  (species)  has  a  set  of  actions  (genes)  that  it  may  or  may  not  engage 
in.  There  are  N  such  actions  which  together  form  the  strategy  of  an  actor. 

2.  Each  of  those  actions  contribute  to  the  success,  or  feiilure,  of  an  actor.  The 
overall  success  of  an  actor  is  called  the  performance  (fitness)  of  an  actor. 

3.  The  contribution  of  the  performance  by  a  particular  action  is  contingent  on  K 
other  actions  by  the  sjime  actor,  and  C  actions  by  each  other  actor. 

4.  Every  actor  chooses  to  change  a  single  action  during  each  time  period.  Identical 
to  Kauffman,  one  can  assume  fitter  dynamics  or  fittest  dynamics. 

One  of  the  key  assumptions  of  NKC  models  is  that  individuals  choose  only  ac¬ 
tions  that  improve  their  position  locailly.  Such  an  assumption  is  consistent  with  the 
bounded  rationality  research  tradition  of  human  and  organizational  be¬ 
havior. The  assumption  of  myopic  behavior  allows  us  to  explain  which  of 
many  possible  Nash  equilibria  (or  “local”  Nash  equilibria^^)  will  be  chosen  from  a 
given  starting  point.  Such  an  approach  not  only  allows  us  to  explain  why  actors 
sometimes  choose  an  inferior  local  optimum,  but  also  why  outcomes  are  history 
dependent  and  contingent  on  small  events.® 
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3.2  COEVOLVING  STRATEGIES  OF  FIRMS 

NKC  models  can  be  used  to  study  coevolving  strategies  of  economic  actors.  A 
particular  example  would  be  a  firm’s  decision  to  diversify  into  new  markets.  The 
decision  to  diversify  is  difficult  because  firms  must  take  into  account  mauiy  interde¬ 
pendent  dimensions,  such  as  the  profit  potential  of  different  markets,  the  synergy 
effects  of  operating  in  related  markets,  and  the  strategic  responses  of  competitors. 

For  example,  Michelin,  a  major  European  tire  manufacturer,  decided  to  enter 
the  North  American  market  because  it  expected  high  profits  in  that  market  due 
to  its  superior  technology  in  producing  high-quality  radial  tires.  However,  Michelin 
did  not  expect  that  Goodyear,  a  U.S.  tire  manufacturer,  would  retaliate  by  lower¬ 
ing  its  tire  prices  in  major  European  markets.  This  example  not  only  demonstrates 
the  complexity  of  diversification,  but  also  demonstrates  the  consequences  of  myopic 
behavior  when  strategic  decisions  are  linked.  Michelin  tried  to  myopically  improve 
its  market  position,  but  at  the  same  time  “deformed”  Goodyear’s  profitability  land¬ 
scape,  which  in  turn  led  Goodyear  to  myopically  cut  prices  in  Europe. 

In  a  model  of  firm  diversification,  a  single  action  would  be  to  enter  or  exit  a 
market.  K  would  be  interpreted  as  the  number  of  synergies  among  different  mar¬ 
kets,  and  C  the  number  of  interdependent  actions  between  each  pair  of  competitors. 
Such  a  model  of  firm  diversification  can  be  used  to  study  why  firms  “lock  into”  sub- 
optimal  positions  (or  “competency  traps,”  as  Levitt  and  March^^  call  them).  This 
model  allows  us  to  study  how  the  effects  of  “lock  in”  varies  with  the  technological 
complexity  captured  by  K  and  C.  Once  the  effects  of  “lock  in”  are  understood,  NKC 
models  could  be  used  to  suggest  improvement  of  strategic  behavior,  so  that  firms 
could  “walk  on  rugged  profitability  landscapes”  without  falling  into  competency 
traps. 


3.3  THE  EMERGENCE  OF  BELIEF  SYSTEMS 

NKC  models,  a^{.;lieu  to  social  psychology,  caji  offer  insights  into  the  dynamics 
of  changes  in  attitudes  and  beliefs  within  an  organization  or  society.  Further,  it 
can  explain  the  existence  of  particular  types  of  reinforcing  cleavages.  For  example, 
there  is  a  strong  correlation  among  attitudes  towards  government  intervention  in  the 
economy,  the  necessity  of  a  strong  military,  auid  the  desirability  of  a  social  welfare 
system.  We  discuss  an  application  below  where  people  are  the  level  of  analysis, 
but  these  models  of  social  conformity  might  also  be  applicable  to  “societies”  of 
organizations  (e.g.,  governments,^^  businesses,  etc.). 

Psychology  offers  the  beginning  of  a  solution.  One’s  beliefs  are  affected  by 
the  beliefs  of  surrounding  people. Further,  there  tends  to  be  a  consistency 
among  one’s  beliefs,  such  that  the  beliefs  are  mutually  reinforcing  (“cognitive 
consistency”^).  For  example,  an  individual  who  believes  in  the  desirability  of  a 
large  military  budget  would  tend  to  believe  that  such  expenditures  come  at  a  low- 
opportunity  cost.  However,  these  are  results  at  the  individual  level.  The  question 
we  are  asking  is  about  characteristics  of  aggregate  opinion.  The  NKC  model  offers  a 
means  of  examining  what  pattern  in  the  aggregate  would  emerge  out  of  individual 
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beliefs  (for  an  early  model  see  Abelson  and  Bernstein^;  for  more  recent  models  see 
Novak  et  al.^^  and  March^*).  In  this  context  the  NKC  model  would  be  interpreted 
as  follows: 

1.  Each  individual  (species)  has  a  number  of  beliefs  (genes)  which  it  may  or  may 
not  hold.  The  total  number  of  beliefs  is  N. 

2.  An  index  of  cognitive  frustration  (fitness)  can  be  constructed,  based  on  the 
level  of  consistency  among  the  individual’s  beliefs,  and  the  congruence  of  those 
beliefs  with  those  of  surrounding  individuals. 

3.  The  contribution  to  cognitive  frustration  of  each  belief  is  contingent  upon  I\ 
other  beliefs  within  that  individual,  and  C  other  beliefs  in  each  surrounding 
actor. 

4.  Each  actor  may  change  one  belief  each  round,  either  under  the  assumption  of 
“fitter  dynamics”  or  “fittest  dynamics”  as  outlined  by  Kauffman. 

It  would  be  useful  to  assume  that  all  actors  exhibit  identical  interactions  among 
their  own  beliefs.  For  example,  for  all  actors,  believing  “military  spending  is  neces¬ 
sary  to  counter  the  ‘Soviet  threat’”  and  “military  spending  is  good  for  the  economy” 
results  in  a  better  cognitive  frustration  score  than  believing  just  one  or  the  other. 
This  model  could  be  applied  to  the  dynamics  of  opinion  change.  For  example,  until 
very  recently  the  belief  in  the  Soviet  threat  was  the  organizing  principle  of  U.S.  for¬ 
eign  policy.  Beliefs  about  military  spending,  euid  about  policies  towards  particular 
countries,  among  other  things,  were  shaped  by  this  overarching  belief.  Now  that 
this  belief  has  been  exogenously  changed,  a  critical  question  is  how  other  beliefs 
about  U.S.  foreign  policy  will  change. 


3.4  SUMMARY 

The  above  interpretation  of  NKC  models  as  coevolving  strategies  shows  that  this 
family  of  models  has  a  much  broader  range  of  application  than  just  explaining 
coevolving  species  in  biology.  In  particular,  NKC  models  can  be  used  in  such  diverse 
fields  as  economics,  political  science,  organizational  theory,  eind  social  psychology. 
NKC  models  can  be  applied  to  any  research  area  that  involves  studying  complex, 
coevolving  behaviors  at  the  individual,  group,  or  organizational  level. 


4.  CONCLUSIONS 

We  found  NKC  models  to  be  stimulating  and  illuminating.  We  have  tried  to  bring 
together  three  of  the  main  directions  that  our  group  took.  The  first  section  of  this 
paper  summarized  a  mathematical  formalism,,  for  the  NK  family  of  models.  The 
second  section  focused  one.  Our  third  section  explores  possible  applications  of  NK 
models  to  the  social  sciences. 
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We  found  the  NK  and  NKC  formulations  to  be  a  good  intersection  and  basis 
for  discussion  for  individuals  whose  areas  of  interest  reached  from  immunology 
to  evolution  to  physics  and  (as  demonstrated  in  section  3  above)  economics  and 
political  science.  This  is  one  of  the  strengths  of  this  approach  but  may  also  be 
its  downfall.  We  consistently  found  it  difficult  to  rigorously  apply  the  terms  and 
relations  of  NKC  models  into  the  terminology  and  “facts”  of  a  particular  discipline. 
We  look  forward  to  future  results  in  both  the  theoretical  and  specific  applications 
of  these  models. 
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Complex  Patterns,  Simply  Recognized 


INTRODUCTION 

Visual  recognition  may  be  taken  to  be  the  ability  to  respond  specifically  to  a  par¬ 
ticular  scene  of  view  from  among  many  similar  ones.  Images  falling  on  our  ret  inae 
also  need  somehow  to  be  functionally  related  to  our  memories  of  previously  seen 
images  for  identification  to  be  of  any  use. 

Creatures  with  a  need  to  respond* to  the  objects  in  their  visual  environment  in 
unsophisticated  ways  can  be  preprogrammed  with  a  repertoire  of  stimulus/response 
behaviors.  With  increasing  environmental  complexity,  such  programming  becomes 
less  practical,  and  a  more  flexible  approach  is  required.  Things  seen  need  to  be 
recorded  and  used  to  modify  future  action  in  order  to  survive. 

The  human  visual  system  is  powerful  and  anatomically  complex.  This  raises 
the  question:  Is  the  process  of  recognition  itself  necessarily  complex?  What  would 
be  the  minimum  system  requirement  for  human-like  recognition  in  real  visual  envi¬ 
ronments?  The  following  three  major  requirements  may  be  put  forward  as  probable 
prerequisites  for  the  formation  of  useful,  internal  representations  of  visual  images. 


1991  Lectures  in  Complex  Systems,  SFi  Studies  in  the  Sciences  of  Complexity, 
Lect.  Vol.  IV,  Eds.  L.  Nadel  &  D.  Stein,  Addison-Wesley,  1992 
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1.  Selection:  Whether  it  be  via  wavelength  selectivity  or  some  higher-level  process, 
some  choice  about  survival  relevance  has  to  be  made  about  what  should  be 
extracted  from  the  welter  of  information  available  to  the  retina. 

2.  Specificity:  To  avoid  too  much  confusion,  the  system  should  display  a  high  level 
of  specificity  so  that  even  small  changes  in  an  image  should  result  in  perceptible 
differences  the  resulting  representation. 

3.  Relatedness:  Things  seen  should  be  related  in  visual  memory  by  a  range  of 
associations  (. .  .The  shape  of  that  hat  reminds  me  of  a  car  she  was  driving. . .). 

A  simple  algorithm  has  been  developed,  with  reference  to  the  human  visual 
system,  which  can  associatively  store  and  retrieve  information  about  a  large  range 
of  different  images  and  thereby  act  as  a  visual  recognition  device. 

The  central  question  of  how  we  respond  to  objects  may  be  addressed  as  a 
problem  of  coding  (to  gain  access  to  visual  memory)  and  decoding  (the  subsequent 
execution  of,  for  example,  appropriate,  voluntary  muscle  movements).  This  work 
addresses  only  the  first  part  of  the  problem.  The  basic  hypothesis  is  that  images 
which  look  similar;  a  human  being  should  result  in  the  production  of  “similar” 
codes  by  any  process  which  attempts  to  simulate  humcin  recognition  performance. 

It  must  be  stated  that  such  things  as  image  movement,  stereopsis,  color  vision, 
a.:d  level  of  attention  have  not  been  considered  in  this  study.  Neither  was  the 
objective  to  explain  how  efficient  image  tramsmission  and  reconstruction  may  be 
performed.  The  sensation  of  seeing  itself  is  ignored.  What  is  experienced  when, 
for  example,  an  (emotionally  neutral)  triangle  is  viewed  may  be  just  as  much  of 
an  internal  construct  as  is  generally  believed  to  occur  when  we  see  things  in  our 
mind’s  eye.  This  approach  challenges  Marr’s^®  assertion  that  an  explanation  of 
vision  must  conform  to  the  plain  man’s  experience  of  it.  It  seems  reasonable  that 
no  theory  must  contradict  objective  measurements  of  experience,  but  introspection 
is  not  necessarily  a  reliable  test  of  any  theory.  The  visual  system  must  be  able  to 
deal  with  large  numbers  of  combinations  of  sensory  inputs  (these  are  limited,  in 
practice,  by  the  finite  human  lifespan  and  the  fzict  that  our  visual  environment  is 
actually  much  less  than  infinitely  variable).  Here  we  restrict  the  problem  still  further 
to  the  identification  of  monochrome  images  of  objects  (and  parts  of  objects).  Even 
so,  the  number  of  potential  imagesjs  intimidatingly  large. 

As  a  familiar  introductory  example  of  the  type  of  question  for  which  an  expla¬ 
nation  if  required,  how  can  the  following  all  be  recognized  as  variants  on  the  same 
theme,  while  still  being  seen  as  subtly  different. . . 

A  /9A  AAA  A  ^  . 


BACKGROUND 

Seeing  something  is  cleanly  necessary  but  not  sufficient  for  it  to  be  recognized  (it 
might  be  the  first  experience  of  it,  or  it  may  be  out  of  focus  and  thereby  only 
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identifiable  as  being  within  a  certain  general  category,  or  it  may  be  simply  upside- 
down).  Experiments  on  recognition  performance  require,  ideally,  the  control  of  a 
prton  visual  experience,  which  is  difficult  with  human  subjects.  Much  of  the  work 
in  the  literature,  therefore,  does  not  take  this  into  account  or  relies  on  limiting  the 
recognition  task  to  a  discrimination  between  membership  and  non-membership  of 
a  designated  set  of  such  pictures.  Demonstrations  of  the  enormous  capacity  of  the 
human  visual  system  to  store  and  retrieve  pictorial  material  have  been  conducted. 
A  few  examples  are  given  below. 

Goldstein  and  Chance^®  presented  their  subjects,  for  three  seconds  e2u:h.  with 
pictures  in  three  categories:  women’s  faces,  magnified  snowflakes,  and  inkblots 
(three  seconds  corresponds  to  no  more  than  nine  fixations).  Subjects  were  able 
to  achieve  71%  success  in  distinguishing  between  those  slides  of  faces  which  they 
had  seen  before  and  those  which  they  had  not.  (The  recognition  rate  corresponding 
to  success  by  chance  was  14%. ) 

Potter  and  Levy*^  found  recognition  accuracy  for  pictures  varied  from  15% 
(with  a  125ms  exposure)  to  90%  (for  two  seconds’  exposure). 

Recognition  performance  was  found  to  be  a  positive  function  of  the  number 
of  fixations  on  a  given  picture  and  is  not  dependent  on  viewing  duration  per  se 
(if  the  number  of  fixations  is  restricted  to  being  constant).  Pictures  viewed  only 
peripherally  are  not  remembered  at  all.^^ 

Results  such  as  this  seem  to  show  that  at  least  some  significant  things  in  large 
numbers  of  pictures  can  be  efficiently  stored  in  memory,  despite  restricted  access 
to  their  information  content,  if  the  pictures  are  looked  at  directly. 

The  history  of  attempts  to  achieve  pattern  recognition  has  included  both  efforts 
inspired  by  biological  systems  and  those  which  ignored  Nature.  The  list  of  ideas 
includes:  whole-pattern  templates,  e.g.,  the  “bug  detectors’’  of  Lettviu  et  al.‘®; 
feature  (mini  template)  detectors,  derived  from  interpretations  of  the  work  of  Hubei 
and  Wiesel'^;  Marr-Nishihara  canonical  elements^**;  massively  parallel  statistical 
sieves  (neural  networks);  Fourier  transforms  motivated  by  the  findings  of  Campbell 
and  Robson'*;  and  so-called  structural  models  (lists  of  characteristic  properties) 
from  work  on  artificial  intelligence.  Few  have  been  successful  by  any  standard.  The 
work  described  here  is  different  in  that  it  involves  viewing  each  scene  a  small  area 
at  a  time  and  forming  a  unique  representation  of  each  successive,  small  “window’’ 
taken  as  a  whole. 

The  cortices  of  cats,  monkeys,  and  humans  have  been  shown  to  perform  anal¬ 
ysis  of  visual  images  by  the  use  of  oriented  local  filters  tuned  to  different  spatial 
frequencies  (spatial  frequency  bandwidths  of  one  octave  and  orientation  tuning 
bandwidths  of  15-20®  are  typical^).  It  is  not  clear  what,  if  any,  significance  these 
cells  have  for  the  recognition  of  patterns.  A  very  large  part  of  the  visual  cortex  of 
these  species  is  devoted  to  treatment  of  signals  from  the  fovea — a  tiny  area  near 
the  center  of  the  retina.  The  area  with  diameter  subtending,  in  humans,  the  central 
20  minutes  of  arc  of  the  fovea  was  designated -the  foveola  by  Polyak. A  fingernail 
seen  at  arm’s  length  subtends  about  one  third  of  a  degree  at  the  eye.  The  foveola, 
therefore,  subtends  an  angle  equal  to  one  third  of  a  fingernail.  This  tiny  region 
seems  to  have  great  significance  for  the  recognition  of  patterns. 
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The  foveola  is  particularly  suitable  subject  in  the  study  of  vision  because; 

■  it  has  a  large  cortical  representation; 

■  it  contains  comparatively  few,  regularly  arranged  anatomical  elements  (only 
cones  are  present,  providing  dichromacy^); 

■  it  forms  a  di  "ect  link  to  the  visual  cortex: 

■  it  contains  a  relatively  low  ratio  of  cones  to  ganglion  cells  and  the  optics 

which  produce  faveolar  images  have  been  extensively  investigated.^ 

In  addition,  cells  tuned  to  a  very  wide  range  of  spatial  frequencies  are  found  in  the 
foveal  region  of  the  visual  cortex. 

Harmon^  ^  showed  faces  could  be  recognized  using  a  coarse  pixellation  of  16  x  16 
with  eight  grey  levels.  More  recent  work  by  Campbell^  has  confirmed  that  only  a 
few  hundred  activated  groups  of  seven  cones  at  a  time  in  the  foveola  of  the  human 
retina  are  required  to  identify  most  everyday  objects.  This  has  the  result  that  faces 
can  be  identified  at  a  distance  of  35  m. 

Consider  constancy  of  recognition;  it  is  commonly  accepted  that  in  order  for  a 
system  to  be  able  to  recognize  an  object  at  a  distance,  for  example,  it  should  be 
able  to  manipulate  a  scale-invariant  internal  representation  of  that  object  so  as  to 
equate  it  to  the  current  view  and  permit  recognition.  Such  invariants  actually  seem 
not  to  occur  in  a  foveola-diowever.  Size  constancy  fails  below  1/2  degree. 

■  Rotation  invariance:  It  is  very  important  that,  for  example,  a  right-angled 
diamond  and  a  square  of  equal  side  length  are  perceived  as  different. 

■  Position  invariance:  The  threshold  for  displacement  detection  in  an  unstruc¬ 
tured  field  is  near  1.5  min  arc^^  or  about  three  foveolax  cone  diameters. 

The  freedom  of  the  eye  to  move  makes  the  notion  of  position  invariance  vague. 

What  about  local  and  global  changes  of  illumination  and  occlusion?  Recognition 
under  these  circumstances  is  actually  rather  hard  to  do.  We  are  not  particularly 
good  at  spotting  camouflaged  wildlife  or  reading  an  eye  chart  on  which  a  mixture 
of  sunlight  and  the  shadows  of  a  leafy  branch  have  been  superimposed.  By  viewing 
a  scene  as  a  sequence  of  small  areas,  problems  of  figure/ground  segmentation  (such 
as  looking  for  a  particular  bo.v  in  a  box  of  engineering  components)  can  be  rendered 
tractable. 


OPERATION  OF  THE  PRESENT  MODEL 

Every  white  dot  in  a  black  and  white  picture  screen  is  cissumed  to  spread,  according 
to  a  simplified  simulation  of  diffraction,  which  causes  a  group  of  cones  to  become 
activated.  This  is  shown  for  a  pattern  consisting  of  two  stars  or  dots  in  Figure  1 
(natural  scenes  or  hand-drawn  images  can  be  accommodated). 

As  the  model  eye  moves  from  fixation  to  fixation,  each  successive  smzdl  area 
of  an  image  tailing  on  the  central  few  hundred  receptors  of  a  simulated  retina  is 


Complex  Patterns,  Simply  Recognized 


357 


analyzed.  The  image  of  an  object  or  part-object  on  this  central  1/3  degree  of  the 
retina  is  assumed  to  be  moved  relative  to  its  original  position  by  signals  communi¬ 
cated  between  retinal  cell  in  the  three  principle  axes  of  the  regular,  hexagonal  cone 
mosaic  (Figure  1).  This  activation  pattern  may  then  be  sampled  and  processed  by 
idealized  cells  which  are  each  sensitive  to  a  narrow  range  of  orientation  and  spatial 
frequency.  The  differencing  operator  indicated  in  Figure  2  fulfills  this  function  and 
generates  results  which  are  broadly  consistent  (Cl,  C2,  CS. . .)  with  the  kinds  of  re¬ 
sponses  actually  recorded  from  complex  cells  in  the  cortices  of  mammals.^  Attempts 
to  explain  the  significance  which  these  cells  may  have  for  the  recognition  of  objects 
have  hitherto  been  unsuccessful.  For  each  of  the  three  principal  orientations  of  the 
receptor  mosaic,  the  outputs  of  these  simulated  cells  are  summed,  giving  rise  ic 
the  three-element  code.  It  is  of  particular  interest  that  these  simulated  cells  gener¬ 
ate  relatively  small  responses  to  ‘meaningless"  random  dot  patterns  (visual  noise). 
This  is  believed  to  be  related  to  the  physiological  finding  that  no  real  long-term 
memories  are  formed  from  such  images,  thus  avoiding  potentially  massive  waste  of 
memory  capacity,  which  might  result  from  combinatorial  explosion. 

The  asymmetric,  local  transmission  and  adding  of  activation  values  in  the  plane 
of  this  simulated  retine,  shown  in  Figure  1.  has  the  effect  of  specifically  labelling 
edges  within  an  image  according  to  their  orientations.  When  this  resultant  activa¬ 
tion  matrix  is  analyzed,  by-the  oriented  receptive  fields,  the  (x,  y,  z)  code  produced 
is  characteristic  of  the  original  image  in  the  sense  that  any  change  in  the  image 
(other  than  adding  uniform  noise)  must  affect  at  least  one  of  the  (x,  y,  z)  compo¬ 
nents.  This  type  of  process  hcis  been  previously  discussed  in  connection  with  the 
well-developed  visual  system  of  the  octopus.'® 

A  single  receptive  field  width  of  three  cones  and  three  orientations  has  been 
used.  This  is  the  computationally  simplest  selection  which  avoids  errors  of  orienta¬ 
tion,  etc.,  which  is  still  capable  of  surprisingly  effective  recognition  performance. 

Resulting  codes  have  been  generated  and  plotted  as  the  coordinates  of  points 
in  a  three-dimensional  representation  space,  each  of  which  uniquely  stands  for  a 
particular  view. 

Each  view  of  an  object  produces  a  slightly  different  coding  so  that  similar 
views  “clump  together”  in  representation  space.  This  results  in  automatic,  non- 
rigid  perceptual  categorization.  Novel  objects  are  automatically  classified  by  virtue 
of  their  proximity  in  representation  space  relative  to  those  of  known  images.  Known 
objects  are  coded  and  can  reactivate  their  existing  representation  and  its  associates: 
i.e.,  they  are  recognized. 

As  we  manipulate  an  object  or  move  around  it,  we  foveate  many  successive 
views.  Hochberg^^  has  said  that  perception  depends  on  integration  of  the  parts 
seen  foveally  in  each  of  several  glimpses.  The  continuity  of  these  trajectories  has 
the  effect  that  objects  presented  in  the  continuously  varying  sizes  and  orientations 
of  everyday  experience  can  still  be  recognized. 
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FIGURE  1  Activation  levels,  caused  by  an  image  of  two  “stars”  falling  on  the  simulated 
retina,  are  shown  being  transmitted  asymmetrically  along  the  axes  of  the  hexagonal 
mosaic  so  as  to  form  a  residual  activation  matrix  (mosiac  is  shown  distorted  to  square 
for  computational  ease). 


Warrington  and  Taylor^^  found  that  certain  neurological  patients  were  capable 
of  recognizing  objects  only  when  seen  from  “conventional”  viewpoints  (suggesting 
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that  they  were  unfamiliar  with  this  view  and  had,  therefore,  no  stored  representation 
of  it).  Similarly,  Palmer,  Rosch,  and  Clare"°  reported  that,  for  example,  an  unusual 
view  of  a  horse  was  not  easily  recognizable.  This  suggests  that,  rather  than  com¬ 
puting  what  the  plain  view  of  a  horse  actually  was  from  stored,  conventional-view 
“coordinates,  ’  failure  ever  to  have  seen  and  recorded  this  view  meant  that  it  was 
simply  not  associated  with  other,  conventional  views  labelled  “horse.”  Yin"®  found 
that  his  subjects  were  poor  at  recognizing  inverted  faces.  Diamond  and  Carey^ 
found  that  this  was  true  for  a  wide  range  of  other  objects,  too. 
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FIGURE  2  Design  of  simulated  receptive  fields. 
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Faces  (the  relevant  literature  is  reviewed  in  Bruce-)  may  be  a  special  case 
in  that  they  form  a  category,  the  members  of  which  clearly  differ  from  each  other 
somewhat  less  than  from  the  members  of  different  groups  (e.g.,  motor  vehicles).  For 
example,  there  have  been  many  reports  of  patients  who  demonstrated  an  inability 
to  recognize  whole  categories  of  objects  (not  just  faces,  as  reported  in  Bodmer^) 
after  local  lesions.  Some  of  these  results  seem  to  suggest  that  local  lesions  produce 
memory  loss  for  objects  of  a  very  specific  category. 

Images  consist  of  the  spatial  relationships  between  their  elements  (pixels).  Any 
attempt  to  analyze  an  image  and  form  its  representation  by  counting  up  the  number 
of  edges  in  different  orientations,  for  exeunple,  is  likely  to  fail  to  encode  the  relation¬ 
ships  which  occur  at  corners,  etc.,  or  requires  the  a  priori  specification  of  a  wide 
range  of  unwieldy  “elements”  of  which  all  images  may  be  assumed  to  be  composed. 
An  unspecific  (resulting  in  confusions  like  “L”  for  “7”)  or  insufficiently  applicable 
code  is  produced.  In  the  work  described  here,  more  of  the  spatial  interrelationships 
are  encoded,  leading  to  an  ability  to  form  representations  of  a  wide  range  of  image 
types. 


RESULTS 

Results  are  presented  for  a  small  range  of  different  images  (Figures  3  through  6) 
as  three-dimensional  plots  of  their  representations  (Figures  7  through  10).  Figure  3 
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FIGURE  1 0  Some  more  complicated  images,  including  a  course  temporal  sequence 
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is  a  composite  of  the  simpler  patterns,  all  of  which  appear  as  representations  in 
the  bottom  right-hand  corner  of  Figure  7.  The  numbers  in  bold  type  in  Figures  4, 
5,  and  6  are  used  to  indicate  the  relative  positions  of  representations  in  Figures  7 
through  10.  Figure  7  shows  the  overall  layout  of  representation  space  and  indicates 
that  a  wide  variety  of  different  types  of  images  can  be  accommodated  within  this 
scheme.  There  is  a  general  increase  in  image  “complexity”  from  the  origin  (single¬ 
point  representations)  outwards  toward  the  representations  of  faces.  Figures  8,  9, 
and  10  look  in  more  detail  at  regions  of  the  space  shown  in  Figure  7. 

Similar  images  do,  indeed,  result  in  similar  codes  and  this,  in  turn,  causes 
clustering  of  their  representations.  A  short  trajectory  is  shown  for  the  image  of  ' 
fist  opening  its  fingers.  Also  note  that  the  perceptually  ambiguous  image  of  the 
duck/rabbit  is  between  that  of  the  duck  cind  that  of  the  rabbit.  This  is  true  also 
for  the  ambiguous  RjP  image  which  lies  between  the  points  for  P  and  R. 


DISCUSSION  AND  CONCLUSIONS 

These  ideas  may  relate  to  the  work  on  the  inferotemporal  cell  ensembles  known 
to  be  selectively  responsTw  to  complex  visual  stimuli.  Perhaps  an  ensemble  might 
correspond  to  a  knot  of  trajectories  in  representation  space,  each  signalling  the  pres¬ 
ence  of  a  view  similar  to  those  to  which  its  neighbors  are  sensitive.  Ferret  et  al.,*' 
for  example,  reported  the  apparent  storage  of  face  information  in  the  inferotem¬ 
poral  cortex.  Sakai  and  Miyashita^^  reported  that  IT  cells  recorded  the  temporal 
sequence  of  unfamiliar  visual  images.  It  has  been  shwon  that  IT  cells  have  receptive 
fields  centered  on  the  fovea  and  that  adjacent  cells  have  similar  response  properties. 

The  number  of  foveations  in  a  human  lifetime  (3  per  second  —  5  x  10^)  is, 
surprisingly,  orders  of  magnitude  less  than  the  number  of  neurons  in  the  visual 
centers  of  the  brain,  making  it  hard  to  dismiss  these  ideas  purely  on  the  grounds  of 
“capacity.”  It  is  possible,  it  would  seem,  for  recognition  to  be  performed  on  the  basis 
of  stored  representations  of  every  single  foveation  in  a  human  lifetime.  This  view  of 
the  visual  system  regards  the  brain essentially  a  simple  image  analyzer  linked, 
by  a  potentially  simple  coding  process,  to  an  enormous  data  bank  of  efficiently 
associated  visual  memories  of  shape  information. 

The  postulated  trajectories,  if  they  exist  in  reality,  could  give  an  insight  into 
prediction  of  what  is  about  to  appear.  It  may  also  be  that  linkage  strength,  between 
locations  forming  a  trajectory,  is  related  to  probability  of  recall  by  some  Hebb-like 
rule.  Indeed,  this  system  could  be  thought  of  as  having  the  capacity  to  jump  to  the 
wrong  conclusion;  when  asked  to  identify  a  church  steeple,  it  may  respond  'rocket 
nose  cone.”  This  illustrates  an  ability  to  generalize  and  make  errors. 

The  system  described  here  is  very  simple;  yet  does  seem  to  have  some  useful 
properties; 

■  It  is  potentially  fast — one  trial  learning. 
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■  Segmentation  can  be  achieved  by  looking  at  small  enough  areas  at  a  time- — local 
shadows  and  occlusion  can  be  accommodated. 

■  It  is  general  and  flexible. 

■  It  is  specific  and  accurate  as  long  as  images  are  not  composed  of  regions  which 
are  separated  by  empty  spaces;  larger  than  3W  (see  Figure  8). 

■  Only  simple  “technology”  Is  required. 

■  Two  ways  to  associate  things  are  incorporated — visual  “similarity”  and  expe¬ 
rience  of  sequences  stored  as  trajectories. 

Kohonen'"^  estimated  in  1988  that  there  had  been  30,000  papers  published  on 
pattern  recognition  and  that  the  performance  of  artificial  methods  fell  far  short  of 
that  of  biological  sensory  systems.  Perhaps  we  fail  to  recognize  that  although  bio¬ 
logical  systems  are  dauntingly  complex  at  first  sight,  they  often  have  an  underlying 
simplicity  of  principle. 


ACKNOWLEDGMENTS 

Thanks  are  due  to  Professor  F.  W.  Campbell,  FRS;  Professor  Igor  Aleksander: 
Aileen  Briggs;  the  U.S.  Department  of  the  Environment;  the  National  Science  Foun¬ 
dation;  the  Office  of  Naval  Research;  a  consortium  of  universities  and  laboratories, 
including  the  Santa  Fe  Institute;  and  Apple  Computer  Inc.  This  work  was  partly 
supported  by  a  grant  from  the  Kenneth  Craik  Fund  of  St.  John’s  College.  Cam¬ 
bridge. 

Note:  a  copy  of  the  software  used  in  these  investigations  (which  runs  on  the 
Apple  Macintosh  computer  and  makes  use  of  Microsoft’s  QuickBasic  compiler  and 
Apple’s  HyperCard  software)  is  available  from  the  author  at  the  above  address. 


REFERENCES 

1.  Bodmer,  J.  “Die  Prosop-Agnosie.”  Archiv  fur  Psychtaine  und  Nervenkrank- 
heieten  179  (1947);  6-53. 

2.  Bruce,  V.  Recognizing  Faces.  London:  Lawrence  Erlbaum,  1989. 

3.  Campbell,  F.  W.,  and  R.  W.  Gubisch.  “Optical  Quality  of  the  Human  Eye.” 
J.  Physiol.  186  (1966):  558-578. 

4.  Campbell,  F.  W.,  and  J.  G.  Robson.  “Application  of  Fourier  Analysis  to  the 
Visibility  of  Gratings.”  J.  Physiology  197  (1968):  551-566. 

5.  Campbell,  F.  W.,  and  Y.  E.  Shelepin.  “The  Mechanics  of  the  Foveola  and 
Its  Role  in  Defining  an  Object.”  Paper  presented  at  the  Fergus  Campbell 
Symposium  on  the  Temporal  and  Spatial  Domain,  Twelfth  ECVP  Abstracts, 
A50. 


368 


Paddy  Andrews 


6.  Curcio,  C.  A.,  K.  R.  Sloan,  R.  Kalina,  and  A.  Henrickson.  “Human  Photore¬ 
ceptor  Topography.”  ./.  Comp.  Nourology  292  (1990):  497-523. 

7.  De  Valois,  E.  L.,  and  K.  K.  De  Valois.  Spatial  Vision.  Oxford:  Oxford  Univer¬ 
sity  Press,  1988. 

8.  Diamr.id,  P.,  and  S.  Carey.  “Why  Faces  Are  and  Are  Not  Special:  An  Effect 
of  Expertise.”  J.  Exp.  Psych.  115  (1986):  107-117. 

9.  Ditchburn.  R.  W.  Eye  Movements  and  Visual  Perception.  Oxford:  Clarendon 
Press,  1973. 

10.  Goldstein.  A.  G.,  and  J.  E.  Chance.  “Recognition  of  Complex  Visual  Stim¬ 
uli.”  Perception  &  Psychophysics  9  (1971):  237-211. 

11.  Harmon,  L.  D.  “The  Recognition  of  Faces.”  Set.  Am.  229  (1973):  71-82. 

12.  Hochberg,  J.  “Levels  of  Perceptual  Organization.”  In  Perceptual  Organiza¬ 
tion,  edited  by  M.  Kubovy  and  J.  R.  Plomerantz.  Hillsdale,  NJ:  Erlbaum. 
1981. 

13.  Hubei,  D.  H.,  and  T.  .\.  Wiesel.  "Receptive  Fields  and  Functional  Architec¬ 
ture  of  Monkey  Striate  Cortex."  .).  Physiology  195  (1968):  215-243. 

14.  Kohonen,  T.  “The  Role  of  Adaptive  and  Associative  Circuits  in  Future  Com¬ 
puter  Designs.”  Neural  Computers,  edited  by  R.  Eckmiller,  and  C.  H.  von  der 
Malsburg.  Heidelburg:  Springer  Verlag,  1988. 

15.  Legge,  G.,  and  F.  W^Oampbell.  “Displacement  Detection  in  Human  Vdsion." 
Uiston  Research  21:  205-213. 

16.  Lettvin,  J.  Y.,  H.  R.  Maturana,  W.  S.  -McCulloch,  and  W.  H.  Pitts.  “What 
the  Frog’s  Eye  Tells  the  Frog’s  Brain.”  Proc.  Inst.  Rad.  Engrg.  47  (1959): 
1940-2051. 

17.  Loftus,  G.  R.  “Eye  Fixations  and  Recognition  Memory  for  Pictures.”  Cogni¬ 
tive  Psych.  3  (1972):  525-551. 

18.  Marr,  D.,  and  H.  K.  Nishihara.  “Representation  and  Recognition  of  the  Sta- 
tial  Organization  of  Three-Dimensional  Shapes.”  Proc.  Roy.  Soc.  London  B 
200  (1978):  269-294. 

19.  Marr,  F).  Vision.  San  Francisco,  CA:  Freeman,  1982. 

20.  Palmer,  S.  E.,  E.  Rosch,  and  P  Chase.  “Canonical  Perspective  and  the  Per¬ 
ception  of  Objects.”  In  Attention  and  Performance,  edited  by  J.  Long  and 
D.  Baddeley,  vol.  IX.  Hillsdale,  NJ:  Lawrence  Erlbaum,  1981. 

21.  Perret.  D.  I.,  P.  A.  J.  Smith,  D.  D.  Potter,  A.  J.  Mistlin,  A.  S.  Head.  .\.  D. 
-Milner,  and  M.  A.  Jeeves.  “Neurones  Responsive  to  Faces  in  the  Temporal 
Cortex:  Studies  of  Functional  Organisation,  Sensitivity,  and  Relation  to  Per¬ 
ception.”  Human  Neurobiol.  3  (1984):  197-211. 

22.  Polyak,  S  The  Retina.  Chicago:  University  of  Chicago  Press,  1941. 

23.  Potter,  M.  .,  and  E.  I.  Levy.  “Recognition  Memry  for  a  Rapid  Sequence  of 
Pictures.”  J.  Exp.  Psych.  82  (1969):  10-15. 

24.  Ross,  J.,  B.  Jenkins,  and  J.  R.  Johnstone.  “Size  Constancy  Fails  Below  Half  a 
Degree.”  Nature  283  (1980):  473-474. 

25.  Sakai,  K.,  and  Y.  Miyashita.  “Neural  Organization  for  the  Long-Term  Mem¬ 
ory  of  Paired  Associates.”  Nature  354  (1991):  152-155. 


Complex  Patterns,  Simply  Recognized 


369 


26.  Sutherland,  N.  S.  “Visual  Discrimination  of  Orientation  and  Shape  by  the 
Octopus.”  In  Perceptual  Processing,  Stimulus  Equivalence  and  Pattern  Recog¬ 
nition,  edited  by  P.  C.  Dodwell.  New  York:  Meredith  Corporation.  1971. 

27.  Warrington,  E.  K.,  and  A.  M.  Taylor.  "Two  Categorical  Stages  of  Object 
Recognition.”  Perception  7  (1978):  695-705. 

28.  Wassle,  H.,  and  B.  B.  Boycott.  Physiological  Reviews  71  (1991):  2. 

29.  Yin,  R.  K.  “Face  Recognition  by  Brain-Injured  Patients:  A  Dissociable  Abil¬ 
ity?”  Neuro-Psychologia  8  (1970):  395-402. 


Antonio  C.  Roque  Da  Silva  Fiiho 

School  of  Cognitive  and  Computing  Sciences,  University  of  Sussex,  Palmer,  Brighton  BN1 
9QH,  United  Kingdom 


Dynamical  Behavior  of  a  Pair  of  Spatially 
Homogeneous  Neural  Fields _ 


Various  types  of  dynamical  behaviors  are  studied  for  a  neural  network 
made  of  excitatory  and  inhibitory  neurons  arranged  separately  in  two  layers 
under  the  assumption  of  uniform  activity  throughout  the  layers.  The  layers 
are  treated  mathematically  as  continuous  one-dimensional  fields,  and  the 
neurons  have  binary  outputs.  The  analysis  of  the  system  follows  one  done 
previously  by  Amari,  but  now  the  synaptic  strengths  can  vary  with  time 
according  to  two  versions  of  Hebb’s  rule.  The  general  existence  conditions 
of  the  dynamical  behaviors  for  the  two  versions  are  investigated,  and  the 
allowed  cases  presented. 


1.  INTRODUCTION 

Neural  networks  are  very  complex  dynamical  systems,  posing  enormous  difficulties 
for  theoreticians  to  treat  them  mathematically.  So  far,  only  very  simple  kinds  of 
networks  could  have  been  satisfactorily  analysed  in  mathematical  terms  (for  a  re¬ 
view  see,  e  g.,  Levine^),  leaving  many  questions  about  more  general  networks  still 
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unanswered.  However,  simple  systems  can  sometimes  give  us  valid  insights  about 
the  behavior  of  more  complex  ones,  and  one  can  always  have  the  hope  that  by 
adding  some  small  new  features  to  simple  systems,  one  can  find  new  types  of  be¬ 
haviors  and  gain  deeper  understandings  about  more  complex  systems.  The  purpose 
of  the  present  work  is  to  try  to  show  this  for  neural  netv.’orks  made  of  excitatory 
and  inhibitory  neurons. 

Earlier  studies  on  dynamical  behaviors  of  neural  networks  consisting  of  neurons 
which  are  either  excitatory  or  inhibitory,  and  have  connections  of  the  so-called 
lateral  inhibition  type,  were  done  in  the  1970s  by  Wilson  and  Cowan,®’^  Ellias  and 
Grossberg,”  and  Amari.^ 

In  particular,  Amari  modelled  the  neurons  as  being  arranged  in  a  continuous 
fashion  along  a  pair  of  one-dimensional  neural  fields,  one  made  of  excitatory  neurons 
and  the  other  made  of  inhibitory  neurons.  The  states  of  the  points  on  the  fields 
were  described  by  functions  «j(x,  t),i  =  1.2,  giving,  for  each  instant  of  time  t. 
the  average  membrane  potentials  of  the  neurons  around  x  on  the  excitatory  and 
inhibitory  fields  respectively.  The  u’s  were  assumed  to  have  a  rate  of  change  with 
time  proportional  to  the  weighted  integrals  over  the  fields  of  the  outputs  of  the 
excitatory  and  inhibitory  neurons,  including  self-excitation.  The  outputs  of  the 
neurons  were  assumed  to  be  given  by  the  step  function 


if  u  <  0; 
if  u  >  0. 


This  was  done  for  mathematical  convenience,  since  it  was  claimed  that  the  results 
obtained  would  be  valid  for  a  monotonically  increasing  output  function  of  u  with 
saturation.  The  synaptic  strengths  were  assumed  to  be  time  invariant  and  depen¬ 
dent  only  on  the  distances  between  the  neurons,  u;(x,  x')  =  u}{x~x').  The  inhibitory 
neurons  did  not  have  connections  among  themselves,  and  the  connections  from  ex¬ 
citatory  to  inhibitory  neurons  had  a  very  narrow  fan  out,  so  that  only  the  inhibitory 
neurons  imediately  below  a  given  point  on  the  excitatory  field  would  receive  con¬ 
nections  from  it.  Besides,  the  strengths  of  the  excitatory-excitatory  synapses  were 
stronger  than  the  strengths  of  the  inhibitory-excitatory  synapses  at  short  distances, 
but  weaker  than  them  at  longer  distances  (characterizing  the  lateral  inhibition  kind 
of  connections). 

Amari  studied  the  dynamics  of  his  field  equations  for  two  special  cases,  namely 
when  the  solutions  are  spatially  homogeneous,  u{x,  t)  =  u{t),  and  when  the  solution 
is  a  stationary  travelling  wave  of  a  fixed  shape,  u(x,<)  =  g{x  —  vt).  He  showed 
that  solutions  of  both  kinds  are  possible,  and  gave  some  examples  of  them.  In 
particular,  for  the  spatially  homogeneous  case,  he  found  stable,  oscillatory,  and 
transient  behaviors.  Assuming  spatial  homogeneity  one  can  represent  the  state  of 
the  system  by  the  vector  u  =  («i,U2)  in  the  111-U2  plane.  In  the  first  quadrant  u\ 
and  U2  are  positive,  and  because  /[u]  is  the  step  function,  /[ui]  =  /[«2]  =  1  in 
this  quadrant.  In  the  same  way  for  the  second,  third,  and  fourth  quadrants,  one 
has  f[u\]  =  0  and  /[U2]  =  1,  /[ui]  =  /[u?]  =  0,  and  /[ui]  =  1  and  /[u?]  =  0 
respectively.  In  the  «i-U2  plane,  a  stable  state  was  identified  as  a  constant  vector 


Dynamical  Behavior  of  a  Pair  of  Spatially  Homogeneous  Neural  Fields 


373 


in  a  certain  quadrant  towards  which  u  would  tend  when  having  its  initial  position 
in  a  different  quadrant.  A  transient  behavior  was  identified  as  a  situation  in  which 
the  initial  vector  uo  happened  to  be  in  the  same  quadrant  as  the  constant  vector, 
so  that  the  system  would  never  get  out  of  that  quadrant,  decaying  quickly  towards 
the  constant  vector.  On  the  other  hand,  the  oscillatory  behavior  was  characterized 
by  a  constant  jump  of  the  state  vector  from  a  quadrant  to  the  next  one,  and  from 
this  one  to  the  next  one.  etc. 

In  this  work,  Amari's  analysis  is  extended  by  incorporating  into  his  model  the 
following  features: 

■  The  inhibitory  neurons  have  connections  among  themselves. 

■  The  excitatory-inhibitory  connections  have  a  larger  fan  out,  so  that  excitatory 
neurons  at  a  point  x  can  make  synapses  to  inhibitory  neurons  located  at  points 
other  than  x. 

m  The  two  fields  receive  an  external  excitatory  input  v. 

■  All  the  synaptic  strengths  can  vary  with  time  according  to  rules  defined  in  the 
next  section. 


2.  THE  FIELD  EQUATIONS 

The  general  field  equations  obeyed  by  the  excitatory  and  inhibitory  membrane 
potentials  ui(x,t)  and  U2{x,t)  are 


and 


dui{x,t) 

^  Ft 


du2{x,t) 

’’  Ft 


=  -Ui(x,t)-t-  J u)i(x,x'  J)f[ui{x',t)]dx' 

—  j  U}2{x,x' ,t)f[u2{x\t)]dx'-\- 

S\{xA)v{x,t)  —  h\ , 

=  -U2(-r,0  +  J  ui3{x,x',t)f[ui(x'A)]dx' 

-  J  UJ^ix,  x'  J)f[u2ix' ,t)]dx' 

+  S2(x,t)v{x,i)  -  /l2. 


(1) 


(2) 


where  r  is  the  time  constant  of  neuronal  dynamics,  assumed  to  be  the  same 
for  both  kinds  of  neurons;  u)i(x,x' ,t),i  =  1,...,4  are  the  synaptic  strengths  of 
excitatory-excitatory,  inhibitory-excitatory,  excitatory-inhibitory,  and  inhibitory- 
inhibitory  synapses  respectively;  Sj(x,<),i  =:  1,2  are  the  synaptic  strengths  of  the 
connections  between  the  external  input  v(x,t)  and  neurons  in  the  two  fields;  and 
hi,i  =  1,2  {hi  >  0)  are  the  resting  potentials  towards  which  wi  and  U2  decay  in  the 
absence  of  stimuli. 
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The  assumption  of  spatial  homogeneity  permits  one  to  rewrite  these  equation 
as: 


=  -  ui(0  + /[«i(0]  J ,t)dx'  -  f[u2it)]  J ui2ix,x' ,t)dx' 
+  Si(x,t)v(x,t)  —  hi, 


(3) 


and 


=  -  U2(0  +  /[«l(0]  J  ‘^3(x,x',t)dx'  -  f[u2it)]  J  LO^{x,x' ,t)dx' 

+  S2{x,t)v(x,t)  -  /l2- 


(4) 


Regarding  the  equations  governing  the  time  variation  of  the  synapses,  they  will 
be  assumed  to  be  of  a  Hebbian  type.''  Two  possible  versions  of  the  Hebbian  rule  will 
be  considered  as  a  way  of  comparing  their  implications  for  the  system’s  behavior: 

A.  The  first  version  is  the  one  adopted  by  most  of  the  authors  in  the  literature.  It 
assumes  that  the  synapses  vary  proportionally  to  the  product  of  the  outputs  of 
the  pre-  and  post-syn^tic  neurons,  denoted  here  by  Upre  and  Upo,i, 


!  diJi(x,  X  ,  t) 

^  di 


=  -UJi{x,x'  ,t)  +  Ci{x,x')f[Upo,t{t)]f[Upreit)]. 


(5) 


B.  The  second  version  is  the  one  proposed  recently  by  the  author.^  In  it  the 
synapses  where  the  pre-synaptic  neuron  is  excitatory  obey  the  same  rule  as 
Eq.  (5),  but  the  synapses  where  the  pre-synaptic  neuron  is  inhibitory  obey  the 
following  rule: 

7-'-—:^;——  =  -U}i{x,x',t)  +  Ci{x,x')[l  -  /[Upo5t(0]]/[Wpre(0]-  (6) 


Let  us  call  the  first  version  type  A,  and  the  second  one  type  B.  In  the  type 
A  version,  the  synaptic  strength  of  both  excitatory  and  inhibitory  synapses  will 
always  increase  when  the  pre-  and  post-synaptic  neurons  are  firing  in  synchrony.  In 
the  type  B  version,  this  will  happen  as  well  for  excitatory  synapses,  but  inhibitory 
synapses  will  increase  only  when  the  pre-synaptic  neuron  is  firing  and  the  post- 
synaptic  one  is  not  firing.  The  constant  r'  appearing  in  the  above  equations  is  the 
time  constant  characteristic  of  the  synaptic  dynamics  and  is  the  same  for  both 
versions  and  for  the  four  synaptic  types.  The  quantities  c,(x,x'),j  =  are 

assumed  to  be  given  by 


for  I  X  —  x'  |< 
for  I  X  -  x'  |>  (i, 
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where  is  the  maximum  distance  within  which  the  ith  synaptic  type  can  have 
non- zero  strength. 

As  the  external  input  is  excitatory,  the  synapses  between  it  and  the  neurons 
in  the  fields  will  cJways  change  with  time  according  to  the  type  A  version  of  the 
Hebbian  rule, 

^ c ■{ X  f ^ 

’  =  ~Si{x.t)  +  bif[ui{t)]v{x,t),  (7) 

where  6,  ,  i  =1,2  are  constants,  and  t"  is  the  time  constant  characteristic  of  the 
dynamics  of  the  input  synapses.  It  is  assumed  to  be  different  than  t'. 

Since  f[u]  is  the  step  function,  one  can  analyse  the  system’s  behavior  in  the 
U1-U2  plane,  as  Amari  did.  For  each  quadrant  of  this  plcine,  the  /[u]’s  «ire  constant, 
so  that  Eqs.  (5)  and  (6)  can  easily  be  solved  having  solutions  decaying  exponentially 
with  time  as  exp(— f/r'). 


uji(x,x',t)  =u{x,x’)e  (-1-C,), 


where  the  constants  Ci  were  put  in  between  brackets  because  they  appear  depending 
on  the  quadrant  and  the  version  of  the  Hebbian  rule  adopted. 

For  simplicity,  we  are  going  to  assume  that  the  external  input  is  the  same  for 
all  positions,  and  is  kept  constant  up  to  a  certain  time  to  emd  silenced  immediately 
after  that. 


for  0  <  f  <  <0; 
for  t  >  to- 


This  implies  that  the  strengths  of  the  input  synapses  have  also  to  be  spatieilly 
homogeneous  and  to  decay  exponentially  with  time,  according  to  e\p{—t/T"), 


Si{t)  =  s,e  (+6,t;),  for  t  <  to, 


and 

s,(<)  =  ,  for  f  >  to- 

As  it  is  possible  to  determine  the4.emporal  behavior  of  the  synaptic  strengths  Wj 
and  s,-  for  each  quadrant,  the  equations  for  the  membrane  potentials  «,  are  reduced 
to  the  general  type 

TUi  =  -Ui  +  f{t), 

where  the  /(f) ’s  are  known  functions  of  time,  one  for  each  quadrant.  Equations  of 
this  type  can  be  solved  using  the  integrating  factor  e*/’’, 

+  u/t)  =  e‘/V(f)  -»  «(f)  =  J 

where  ib  is  a  constant. 

Hence,  representing  the  global  state  of  the  network  by  the  vector  u  =  (ui,  U2), 
one  can  write  the  solutions  of  Eqs.  (3)  and  (4),  one  for  each  quadrant,  as 
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1st  Quadrant 


u(<)  =  — - W^e  H - ^ - vse  +  k'^e  +  h. 

T  —  T  T”  —  T 


for  t  <  to, 


1(0  =  +k^e~‘/^ +  L^ +h,  ijT  t>to-. 


2nd  Quadrant 


u(0  =  — - +-—7— — vse  +k^^e  ‘'^’’  +  u^b^^+L^^+h,  for  t  <  to, 

T  ~~  T  t”  —  r 


u(0  =  +  h,  for  t  >  <0; 

r  —  T 


3rd  Quadrant 


u(0  =  -7, - vse  ’  +  k^^^e  +  h,  for  t  <  to, 

t"  —  T 


u(0  =  k^^^e  +  h,  for  t  >  to; 


4th  Quadrant 


u{t)  =  +  -JJ—vse~^l^"  +  k'^e-'/^  +  .  +  h, 

T  ~  T  t"  —  T 

for  t  <  to, 


u(0=  ^^W^'"e-‘/^'  +  k^^e-‘/^  +  L^'"+h,  for  <>  <0-  (15) 

The  interesting  point  about  these  solutions  is  that  they  are  the  same  for  both 
versions  of  the  Hebbian  rule  considered  in  this  paper;  only  some  of  the  constant 
vectors,  defined  below,  are  different.  The  constant  vectors  appearing  in  the  above 
solutions  are  the  following  (apart  from  the  L'’s,  all  of  them  are  the  same  for  the 
two  versions  of  the  Hebbian  rule  considered): 

=  (-u;2,-u;4),  =  (u;,,u;3), 
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where  the  u;,’s  are  defined  as 


uJi  =  /  uJi(x,x')dx'-, 
J  —  OO 


S  =  (S1,S2);  b^  =  (6i,6o);  b^^=(0,62);  h^^  =  ibuOy,  h  =  (-h, 

=  (2^iCi-2^2C2. 24^3-2^4^4);  =  (0,-24c4);  =  (24ci,0)  (Type  A)  ; 

and 


L^  =  (24ci,0);  =(0,-24c4);  =  (24ci,0)  (TypeB)  ; 


where 


2iiCi 


Ci{x,  x')dx' 


x')dx'. 


Thus,  one  has  now  the  conditions  of  predicting  the  system’s  behavior  for  a 
given  set  of  constants  and  an  initial  value  for  u.  Instead  of  doing  that  in  this  paper, 
which  would  involve  the  (quite  arbitrary)  stipulation  of  all  the  constants  appearing 
in  the  equations,  we  are  simply  going  to  verify  what  sorts  of  dynamical  behaviors 
are  compatible  with  Eqs.  (8)-(15). 


3.  DYNAMICAL  BEHAVIORS  ALLOWED  BY  THE  EQUATIONS 

There  are  two  important  times  which  enable  one  to  determine  the  behavior  ci  the 
system,  namely  to  when  the  external  input  stops  being  applied,  and  t  =  do  which 
gives  the  asymptotic  value  of  u  in  the  absence  of  external  inputs.  Eqs.  (8)-{  15)  allow 
us  to  calculate  the  values  of  u  at  those  times,  depending  on  the  initial  quadrant 
in  which  u  is.  We  will  assume  that  the  time  to  during  which  the  external  signal  is 
applied  is  much  larger  than  the  time^  constant  of  neuronal  dynamics  r,  but  much 
smaller  than  the  two  time  constants  of  synaptic  dynamics,  which  will  be  assumed 
to  have  values  of  the  same  order, 

T  <^to  <r'  t". 


Hence,  one  can  write 

0;  and  ~  ~  I  —  to/r' I  —  to/r” , 

which  leads  us  to  rewrite  Eqs.  (8)-(15)  as  (the  superscripts  labelling  the  u's  indicate 
the  quadrant) 
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1st  Quadrant 

u^(to)  ~  r  ^1  -  +  T  I's  +  v'^W  +  +  h;  (16) 

u^(oo)  =  L^+h:  (17) 


2nd  Quadrant 


3rd  Quadrant 

--Ti^^^(to)  T  1^1  -  vs  +  h;  (20) 

u^^^(oo)  =  h;  (21) 


4th  Quadrant 


u^''(oo)  =  +h.  (23) 

Hence,  the  u‘(to)  and  u’(oo)  do  not  depend  on  the  initial  state  u‘(0).  Irrespective 
of  uhe  point  where  u(t)  starts  off  of  enters  in  the  ith  quadrant,  it  will  always  go 
towards  u'(to)  or  u’(oo),  depending  on  t  being  smaller  or  greater  than  to- 

The  first  type  of  dynamical  behavior  to  be  checked  against  these  equations 
is  the  full  oscillatory  one,  where  by  full  oscillation  one  means  the  system  vector 
passing  through  all  four  queuirants.  This  is  only  possible  for  t  <  to-*''  There  are  two 
possible  types  of  full  oscillations,  clockwise  and  anti-clockwise  (see  Figure  1). 

l^'One  can  clearly  see  this,  because  u^^^(oo)  =  b  =  (—hi,  —h2)  €  3rd  quadrant,  so  that  u  never 
gets  out  of  the  3rd  quadrant  once  it  enters  there  after  to  - 
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FIGURE  1  The  two  possible  types  of  full  oscillatory  behavior.  Only  the  anti-clockwise 
one  is  compatible  with  the  equations. 


TABLE  1  Types  of  Full  Oscillations 

Clockwise 

Anti-clockwise 

u^(to)  e  IV-, 

(<o)  6  /; 

€  II; 
u^^(<o)  €  HI; 

nUto)  e  II; 
u^Uto)  e  III; 
e  IV; 

U^'^(fo)  e  I; 

To  test  whether  or  not  a  full  oscillation  is  compatible  with  Eqs.  (16),  (18), 
(20),  and  (22),  one  can  decompose  u^(<o)-u^'^(to)  into  their  components  along 
vectors  (1,0)  and  (0, 1)  and  check  whether  or  not  the  conditions  (Table  1)  can  be 
simultaneously  satisfied; 

Performing  the  above-described  analysis,  one  finds  out  that  only  the  anti¬ 
clockwise  behavior  is  allowed  by  the  equations,  and  that  this  is  the  case  for  both 
type  A  and  type  B  Hebbian  rules.  To  show  this  here  would  involve  writing  down 
many  algebraic  inequalities,  and  this  was  not  done  for  reasons  of  conciseness.  For 
the  type  B  Hebbian  rule,  the  algebrziic  inequalities  imply  that  the  anti-clockwise 
oscillation  is  possible  only  for  u;2  >  wj,  but  this  does  not  happen  for  the  type  A 

! 
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Hebbian  rule,  where  both  tJt  >  0/2  and  uj2  >  uii  are  allowed;  the  condition  for 
wi  >  0^2  being 

2£2C2  >  r  (cji  -0^2)  +  r  vsi  +  2£ici  +  v^bi  -  hi. 

It  is  interesting  to  mention  here  Amari’s  result  concerning  full  oscillatory 
behavior.^  In  his  paper  Amari  mentioned  only  the  anti-clockwise  oscillation  and 
found  that  its  existence  condition  is  >  uJi.  However,  any  comparisons  between 
the  two  results  would  be  precipitated,  because  Amari’s  a;,  ’s  are  not  the  same  as  our 
u;,  ’s.  Amari’s  Wj’s  are  the  full  strengths  of  the  time-invariant  synapses. 


where  the  superscript  A  indicates  Amari,  while  in  our  case  the  full  strengths  of  the 
synapses  are  time  dependent  and  are  not  given  by  integrals  of  the  w^’s  solely. 

/OO  ^00  /  rOO 

uii{x,x',t)dx' =  /  u)i(x,x')dx'(+  Ci(x,x')dx' 

•00  ^  —  00  —  00 

where  the  integral  of  Ci{x,x')  was  put  in  between  brackets  because  its  presence  in 
the  above  equation  depends  on  the  quadrant  being  considered. 

Another  interesting  dynamical  behavior  whose  possibility  of  existence  can  be 
checked  with  the  use  of  Eqs.  ( 16)-(23)  is  an  oscillation  between  only  two  quadrants, 
which  will  be  called  a  two-quadrant  oscillation  (see  Figure  2). 

For  each  of  the  four  kinds  of  two-quadrant  oscillations,  there  are  nine  possible 
dynamical  ctises.  A  list  of  them  for  the  oscillations  between  the  first  and  the  second 
quadrants  is  given  in  Table  2. 

Each  pair  of  conditions  in  the  above  table  have  to  be  satisfied  together  with 
the  conditions  in  its  heading.  Similar  tables  exist  for  the  other  three  classes. 


TABLE  2 


(^o)  €  II  and  u'^^(<o)  E  I 

G  III; 

G  III; 

G  III; 

€  IV; 

G  III; 

€  I; 

€  II; 

G  II; 

e  II; 

u"{(o)  €  IV; 

G  III; 

n^^{to)eI; 

G  IV; 

G  IV; 

n"\to)  G  IV: 

€  IV; 

G  III; 

G  I ; 
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FIGURE  2  This  figure  shows  schematically  the  four  possible  classes  of  two-quadrant 
oscillation,  each  class  comprising  nine  types  given  by  all  possible  behaviors  in  the 
two  quadrants  left.  Notice  that  because,  las  soon  as  the  system  vector  enters  one  of 
the  "coupled”  quadrants,  it  starts  moving  linearly  towards  the  other  one  in  the  couple, 
it  ends  up  doing  small  oscillations  around  the  intersection  of  the  line  joining  the  two 
vectors  in  the  “coupled”  quadrants  and  the  coordinate  axis  separating  them. 


Combining  the  conditions  for  two-quadrant  oscillations  with  Eqs.  (16)-(23), 
one  obtains  lots  of  algebraic  inequalities.  Analogously  to  the  full  oscillatory  case, 
the  inequalities  reveal  that  two-quadrant  oscillations  are  only  possible  while  f  < 
and  most  of  the  possible  cases  are  ruled  out  by  then.  Only  the  three  cases  shown 
in  Figure  3  are  allowed. 
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Type  B  Rule:  co,  >  w , 


FIGURE  3  The  only  possible  kinds  of  two-quadrant  oscillations  allowed  by  the 
equations.  They  are  allowed  for  both  types  of  Hebbian  rules,  but  the  third  one  can  only 
exist  for  the  type  B  rule  if  W2  >  <^i- 


The  three  cases  shown  in  Figure  3  are  allowed  to  exist  for  both  versions  of  the 
Hebbian  rule  considered  in  this  paper.  However,  the  case  in  which  u^(<o)  €  ^ ^  and 
6  I  can  only  exist  for  type  B  if  u;2  >  wi.  An  interesting  case  is  the  one 
in  which  oscillations  between  the  second  and  the  thiru,  and  between  the  first  and 
the  fourth  quadrants  are  allowed  to  exist.  Then,  depending  on  the  system  starting 
off  in  the  left  or  the  right  side  of  the  ui-U2  plane,  it  will  stay  there  and  do  small 
oscillations  between  the  two  quadrants  of  the  initial  half  without  jumping  to  the 
other  half. 

The  cases  left  to  analyse  are  the  ones  in  which  the  system  does  not  have  any 
oscillatory  behavior.  For  those  cases  the  system  can  have  only  stable  states,  and 
there  are  four  possibilities  then,  namely  having  one  stable  state,  two  stable  states, 
three  stable  states,  and  four  stable  states.  Obviously,  for  t  <  to  the  system  does 
not  have  a  strict  stable  state  because  it  will  decay  from  its  state  at  i  —  io  towards 
the  allowed  states  for  f  >  <o-  As  we  have  seen  above,  for  t  >  to  the  system  cannot 
have  any  oscillatory  behavior,  and  then  it  can  have  only  stable  states.  The  possible 
stable  states  for  t  >  to  axe  shown  in  Figure  4. 

As  we  said  before,  for  t  <to  the  system  cannot  have  any  real  stable  states,  but 
we  can  define  “stability  prior  to  the  vanishing  of  the  external  input,’’  i.e.,  the  system 
having  one,  two,  three,  or  four  stable  points  while  the  input  is  being  applied.I^l  and 
use  this  definition  for  the  oscillatory  cases  to  find  out  what  stable  behaviors  of  this 
kind  are  allowed  by  the  equations. 


1^1  Notice  that  one  of  the  possible  cases  of  two-quadrant  oscillations  shown  in  Figure  3  has  a  stable 
state  in  this  sense  in  the  first  quadrant. 
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The  author  has  analyzed  the  algebraic  inequalities  fo^  48  possibilities  of 
stability  before  /q,  and  found  out  that  there  are  four  monostat  le  cases  allowed 
(shown  in  Figure  5),  six  bistable  cases  allowed  (shown  in  Figure  6),  three  tri-stable 
cases  allowed  (shown  in  Figure  7),  and  one  (the  only  possible  one)  case  having  four 
stable  states  allowed  (shown  in  Figure  8). 

All  stable  states  shown  in  Figures  5-8  are  allowed  for  both  versions  of  the 
Hebbian  rule  considered.  Only  for  three  of  them  (indicated  in  the  figures),  the 
condition  follows  as  an  existence  condition  for  type  B  rules.  The  cases  in 

which  the  system  vector  starts  off  in  a  quadrant  which  contains  a  stable  state,  i.e.. 
it  cannot  leave  the  quadrant,  are  Amari's  transient  states.^ 


FIGURE  4  This  figure  shows  the  possible  behaviors  of  the  system  for  t  —  oo.  The 
type  B  Hebbian  rule  allows  only  the  four  behaviors  underlined,  while  type  A  allows  all 
six.  The  number  of  stable  states  and  the  conditions  for  each  behavior  are  given  below 
each  graph. 
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CO2  >  CO^ 


FIGURE  5  The  four  monostable  cases  before  t  =  <0  allowed  by  the  equations.  They 
are  allowed  for  both  versions  of  the  Hebbian  rule,  but  two  of  them  (indicated  below  the 
graph)  are  only  possible  for  type  B  if  wo  >  wi. 
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FIGURE  6  The  six  bistable  casus  before  t  =  to  allowed  by  the  equations.  They  are 
allowed  for  both  versions  of  the  Hebbian  rule,  without  any  restrictions  on  the  relative 
values  of  ui  and  ui2. 


Type  B  Rule  :  CO;  >  tj- 


FIGURE  7  The  three  tri-stable  cases  before  t  =  to  allowed  by  the  equations.  They 
are  allowed  for  both  versions  of  the  Hebbian  rule,  but  one  of  them  (indicated  below  the 
respective  graph)  is  only  possible  for  type  B  if  0^2  >  ct;i. 
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FIGURE  8  The  only  case 
having  four  stable  states 
allowed  is  the  only  possible 
one.  The  system  vector  stays 
in  the  quadrant  where  it  is 
initially  while  the  external 
input  is  being  applied. 


4.  CONCLUSIONS 

This  paper  presented  an  example  of  a  simple  neural  network  which  can  be  math¬ 
ematically  modeled  and  have  its  behavior  fully  understood  analytically.  Very  few 
dynamical  systems  have  this  property,  but  the  ones  that  have  it  can  be  used  to  give 
us  some  feeling  about  the  behavior  of  more  complex  ones.  In  the  case  of  the  network 
studied  in  this  work,  it  can  be  considered  as  an  approximation  for  a  network  which 
receives  uniform  stimulation  over  a«large  part  of  it  so  that  that  part  has  roughly 
homogeneous  activity. 

We  have  shown,  by  adopting  and  extending  Amari’s  approach  to  a  similar  net¬ 
work,  that  this  network  can  have  a  variety  of  dynamical  behaviors:  full  oscillations, 
two-quadrant  oscillations,  monostable  states,  bistable  states,  etc.  The  existence  con¬ 
ditions  found  for  these  behaviors  are  the  most  general  and,  therefore,  the  weakest 
possible,  in  the  sense  of  not  assuming  cuiy  particular  set  of  values  for  the  constants 
and  parameters  of  the  network  (apart  from  the  condition  involving  the  time  con¬ 
stants).  For  a  given  set  of  constants  and  parameters,  especially  for  the  biologically 
plausible  ones,  the  mathematical  inequalities  to  be  satisfied  would  become  tighter 
and  the  number  of  possible  behaviors  very  much  reduced. 
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A  Cellular  Automaton  to  Embed  Genetic 
Search 


A  non-deterministic  cellular  automaton  with  periodic  boundary  conditions, 
whose  temporal  evolution  resembles  an  artificial-life  world,  is  presented. 
This  artificial-life  activity  takes  place  in  a  two-dimensional  world,  where 
worm-like  organisms  roam  around  mating,  reproducing,  and  being  selected. 
Since  the  motivation  for  this  work  has  been  to  embed  some  form  of  genetic 
search  in  cellular  automata,  the  automaton  is  described  in  terms  of  its  gen¬ 
eral  capabilities  to  act  as  a  framework  within  which  genetic  search  prob¬ 
lems  can  be  defined.  However,  it  is  not  an  aim  of  the  paper  to  discuss  in 
detail  any  particular  application.  Although  the  concept  of  search  has  been 
traditionally  associated  with  function  optimization  and  with  strategies  for 
solving  prespecified  problems,  these  are  not  the  connotations  of  search  we 
mean  here;  rather,  we  refer  to  the  process  of  exploring  the  space  of  possi¬ 
ble  genomes  in  particular  universes,  without  any  concern  for  optimization 
or  preconceived  evolutionary  paths  to  be  followed.  Because  of  this,  and 
also  because  the  built-in  selection  process  can  be  better  seen  as  preserving 
the  non-deleterious  features  of  the  organisms  (in  contrast  to  selecting  for 
the  most  adapted  ones),  the  nature  of  the  evolutionary  process  eventually 
achieved  should  be  seen  as  an  instance  of  the  exaptationist  standpoint  in 
evolutionary  theory.  The  bridge  between  the  activity  of  the  organisms  and 
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the  genetic  search  process  is  made  by  allowing  that  the  main  constituent  of 
the  organisms’  bodies  be  the  genomes  that  define  the  points  in  the  search 
space  under  question.  The  fact  that  the  cellular  automaton  relies  upon  only 
four  states  per  cell  allows  for  its  use  in  a  number  of  ways,  some  of  which 
are  discussed;  indeed,  the  actual  cellular  automaton  described  is  just  one 
possible  example  of  a  large  family.  This  flexibility  opens  up  the  possibil¬ 
ity  of  the  development  of  a  new  class  of  models  to  study  emergence  and 
self-organization  in  evolutionary  processes,  mainly  from  the  standpoint  of 
artificial  life. 


1.  INTRODUCTION 

Originally  conceived  as  an  abstract  model  of  self-reproduction,*®  cellular  automata 
are  currently  considered  as  models  for  complex  natural  systems  that  contain  a 
large  number  of  simple  and  locally  interconnected  elements.  They  can  be  thought 
of  as  mathematical  or  computational  entities,  as  well  as  discrete  dynamic  systems. 
Cellular  automata  are  made  up  of  a  set  of  elements  (the  cells)  that  are  orgeuiized 
in  an  n-dimensional  lattice  (the  cellular  space),  so  that  at  any  time,  each  cell  can 
take  on  one  among  a  set  of  discrete  values  (the  cell  states).  The  states  of  all  cells 
in  the  lattice  are  updated  (usually)  synchronously,  the  new  state  of  eaich  cell  being 
dependent  upon  the  state  of  its  neighborhood,  i.e.,  its  current  state  together  with  the 
states  of  a  group  of  neighboring  cells.  The  updating  of  each  cell  state  is  achieved 
by  applying  to  the  cell  neighborhood  a  set  of  deterministic  or  non-deterministic 
transition  rules  which  are  the  same  for  the  entire  cell  space,  providing  a  sort  of 
underlying  physics  for  the  cellular  automaton  (see  Wolfram^®  and  Gutowitz'  for 
extensive  accounts  of  both  theoretical  and  practical  aspects  concerning  cellular 
automata). 

By  genetic  or  evolutionary  search  we  mean  a  computational  model  of  search 
gleaned  from  concepts  in  biological  evolution,  in  which  non-deterministic  mecha¬ 
nisms  provide  variability  and  selection  of  “genome”-like  structures  that  represent 
the  points  of  the  search  space.  As  new  genomes  are  created,  the  space  is  explored. 
New  genomes  are  created  through  a  sexual  reproduction  process  involving  already 
existing  genomes,  which  implies  that  new  genomes  typically  contain  sequences  of 
“genes”  of  their  “parents.”  By  calling  these  sequences  building  blocks,  we  can  think 
of  the  search  as  a  process  in  which  building  blocks  are  created  and  built  upon,  thus 
allowing  the  exploration  of  the  seeirch  space. 

Our  aim  in  the  paper  is  to  present  a  two-dimensional  cellular  automaton  with 
four  states  per  cell,  within  which  it  is  possible  to  embed  a  form  of  genetic  search. Id 
As  a  consequence,  the  characteristic  feature  of  the  automaton  is  that  its  temporal 

l^l Since  two  out  of  the  four  possible  states  are  indeed  classes  of  states,  in  this  sense  it  would  be 
more  appropriate  to  refer  to  an  actual  family  of  cellular  automata. 
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evolution  very  strongly  suggests  an  artificial-life-type  world,  where  worm-like  or¬ 
ganisms  roam  around,  mating,  reproducing,  and  being  selected.  The  bridge  between 
the  activity  of  the  organisms  and  the  genetic  search  process  is  made  by  allowing 
that  the  main  constituent  of  the  organisms’  bodies  be  the  genomes  that  define  the 
points  in  the  search  space  under  question. 

The  most  well-known  forms  of  genetic  search  in  the  literature  are  the  genetic 
algorithms  and  the  evolutionary  strategies,’'  although  other  methods  also  exist,  such 
as  Koza’s  genetic  programming  (see  Koza,^°  for  example)  based  on  searching  on  a 
population  of  Lisp  programs,  the  one  used  in  MacLennan^'*  in  a  study  of  the  evolu¬ 
tion  of  communication,  and  the  so-called  extended  genetic  algorithm  used  in  Werner 
and  Dyer^®  for  the  same  kind  of  application;  as  far  as  we  know  no  method  has  yet 
been  devised  for  a  cellular  automaton.  Although  the  concept  of  search  has  been 
traditionally  associated  with  function  optimization  and  with  strategies  for  solv¬ 
ing  prespecified  problems,  these  are  not  the  connotations  of  search  we  mean  here; 
rather,  we  refer  to  the  process  of  exploring  the  space  of  possible  genomes  in  partic¬ 
ular  universes,  without  any  concern  for  optimization  or  preconceived  evolutionary 
paths  to  be  followed.  Therefore,  the  usual  characterization  of  genetic  search  in  terms 
of  creation  of  “useful”  building  blocks  is  not  appropriate  here;  we  will  return  to  this 
point  in  subsection  2.4. 

As  fair  as  genetic  search. is  concerned,  what  we  provide  is  a  cellular  automaton 
that,  due  to  the  features  above,  can  be  seen  as  a  framework  where  a  particular 
genetic  search  can  be  embedded.  The  emphasis  of  the  presentation  is  on  the  de¬ 
scription  of  the  automaton  itself.  The  discussions  about  how  the  framework  can 
be  used  is  made  only  in  general  terms;  it  is  beyond  the  scope  of  the  paper  to 
discuss  in  detail  any  particular  application.  In  the  next  Section  we  present  the  au¬ 
tomaton  by  relying,  whenever  possible,  on  metaphorical  concepts  suggested  by  the 
artificial-life-type  processes  it  supports,  namely,  movement,  selection,  mating,  and 
reproduction;  any  details  related  to  the  actual  state  transitions  involved  can  be 
found  in  the  Appendices,  which  present  the  complete  list  of  transitions  being  used. 
We  then  give  details  of  the  implementation,  and  discuss  how  to  go  about  embedding 
genetic  search  within  the  framework.  Finally,  we  sum  up  the  main  points  raised  in 
the  paper,  pinpoint  some  characteristics  of  the  framework,  and  indicate  directions 
that  we  are  currently  pursuing  so  as  to  extend  it  further. 


2.  THE  CELLULAR  AUTOMATON 

2.1  A  REMARK  ON  SEXUAL  REPRODUCTION  IN  CELLULAR  AUTOMATA 

Considering  the  role  of  sexual  reproduction  in  the  provision  of  variability  in  nature, 
and  the  faw:t  that  the  main  genetic  search  methods  rely  upon  sexual  reproduction, 
it  is  appealing  to  have  such  a  feature  also  appearing  in  the  present  case. 

Although  a  number  of  cellular  automata  exhibiting  the  ability  of  self-reproduc¬ 
tion  have  been  discovered  (see  von  Newmann,^®  Codd,^  Banks, ^  Langton.'^  and 
Byl^),  no  cellular  automaton  capable  of  sexual  reproduction  has  apparently  been 
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reported.  The  closest  reference^®  in  the  literature  seems  to  be  where  an  abstract 
discussion  is  carried  out  on  how  to  extend  the  cellular  automaton  described  in  von 
Newmann^^  so  cis  to  allow  sexual  reproduction;  however,  the  complexity  of  the 
automaton  renders  it  completely  impractical  for  present  purposes. 

The  complexity  of  those  self-reproducing  cellular  automata,  as  expressed  by 
the  number  of  states  in  the  initial  configuration,  as  well  as  the  number  of  possible 
states  per  cell,  varies  significantly  and  depends  on  the  design  constraints  imposed  on 
them.  In  particular,  the  imposition  that  an  automaton  should  possess  the  abilities 
of  universal  computability  and/or  universal  constructability  implies  an  extreme 
complexity .1^1  On  the  other  hand,  one  wishes  to  create  automata  that  Me  prevented 
from  exhibiting  a  trivial  self-reproduction  whose  oversimplification  would  preclude 
the  modeling  of  any  interesting  issue  involved  in  natural  self-reproduction. 

The  standpoint  adopted  here  is  somewhere  between  these  extremes,  since  we 
have  to  satisfy  a  number  of  constraints  such  as  the  necessity  of  a  mating  configura¬ 
tion  for  the  parental  organisms,  the  necessity  of  having  to  cope  with  the  movement 
of  the  parents  and  of  the  offspring  as  reproduction  takes  place,  the  premise  of  being 
able  to  describe  the  activity  of  the  organisms  from  a  high-level  perspective,  etc. 
These  and  other  constraints  will  become  clearer  in  the  next  sections. 


2.2  THE  GENERAL  PICTURE 

The  simplest  way  to  envisage  our  framework  is  by  means  of  the  metaphor  of  an 
artificieil-life  world  in  which  worm-like  organisms  randomly  roam  around  a  two- 
dimensional  world  defined  by  the  automaton’s  cell  space.  Each  organism  can  have 
arbitrary  length  and  is  defined  by  a  sequence  of  contiguous  cells  which  constitute 
its  body,  as  depicted  in  Figure  2.  The  two  cells  at  both  ends  of  an  organism,  the 
terminal  cells,  always  take  on  a  T-state,  and  cam  be  intuitively  thought  of  as  its 
head  and  tail.  The  other  cells  between  the  terminal  ones  are  the  actual  genomes 
which  are  the  objects  of  the  genetic  search.  Each  cell  of  the  genome  represents  a 
gene  locus,  while  its  state,  represented  here  by  a  y-state,  is  one  of  the  possible  alleles 
for  that  particular  gene.  It  should  be  noted  that  gene  and  terminal  states  represent 
classes  of  states.  Throughout  the  paper,  whenever  we  refer  to  a  T-state  or  a  j^state, 
we  mean  any  member  of  the  respective  class;  in  the  situations  where  it  is  necessary 
to  distinguish  between  different  states  (as  in  Figure  2),  a  subscript  is  used. 

Whenever  possible  each  organism  moves,  each  movement  starting  either  left¬ 
wards  or  along  the  ascending  diagonal  on  its  left-hand  side;  as  Figure  1  clarifies,  we 
can  say  that  the  head  of  the  organism  can  move  either  to  the  left  or  the  top-left  cells 
of  the  neighborhood.  The  top  and  right-hauid  edges  of  the  cell  space  are  wrapped 


1^1  As  pointed  out  in  Langton^^ ,  as  far  as  biological  self-reproduction  is  concerned,  neither  of  them 
seem  to  currently  apply  and  it  is  very  unlikely  that  they  ever  did. 

1^1  In  Langton^^  it  is  also  remarked  that  the  self-reproduction  of  a  2-st2Ue  cellular  automaton 
performing  addition  modulo  2  fits  into  this  category  since  it  can  be  entirely  described  at  the  level 
of  the  automaton’s  underiying  physics. 
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FIGURE  2  Example  of  an  n-gene-long  organism  in  the  horizontal  position.  The  g- 
states  represent  the  states  related  to  a  particular  genetic  search  being  performed.  Any 
well-formed  organism  must  have  its  ^-state  cell  delimited  by  the  left  and  right  terminals 
(T -states). 


around,  respectively,  with  the  bottom  and  left-hand  edges,  giving  the  cellular  space 
a  toroidal  geomet'-’'.  With-such  periodic  boundary  conditions,  these  two  movements 
are  sufficient  to  e.  mre  that  the  organisms  have  the  ability  to  cover  the  entire  world. 
In  this  way  the  organisms  are  able  to  approach  any  other  in  the  world  and,  when 
two  of  them  reach  a  predefined  spatial  configuration  relative  to  each  other,  they 
iiiaie  and  ict>ioduce;  after  each  mating,  they  begin  wandering  again,  as  do  their 
offspring.  .Although  the  parental  genomes  can  have  different  lengths,  it  can  be  seen 
that  the  newborn’s  length  will  not  be  more  than  one  gene  longer  than  the  length 
of  the  longest  parent. 

All  this  artificial-life-type  activity  takes  place  over  a  quiescent  background,  that 
is,  the  inactive  regions  in  the  cell  space  that  are  not  occupied  by  the  cells  of  any 
organism,  and  that  are  represented  here  by  t7-states.l^!  The  neighborhood  we  use  for 
the  state  transitions  of  any  cell  is  the  “Moore”  neighborhood,  defined  by  the  cell 
itself  and  the  eight  adjacent  cells  that  surround  it  in  a  square  lattice,  as  Figure  1 
shows. 


2.3  MOVEMENT  OF  THE  ORGANISMS 

The  basic  fact  about  movement  is  that  either  leftweird  or  diagonal  movement  can 
only  start  towards  a  mostly  quiescent  region  of  the  cell  space,  but,  once  started,  it 
will  always  be  completed  even  if  the  moving  organism  has  started  a  reproduction 
process.  In  addition,  a  movement  will  never  proceed  if  another  organism  enters  the 

1^1  Quiescence  means  that  an  inactive  cell  that  is  surrounded  just  by  other  inactive  cells  will  remain 
inactive. 
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neighborhood  of  its  left-hand  side  terminal;  this  prevents  organisms  from  "bump¬ 
ing”  into  each  other. 

As  an  organism  moves,  a  special  state  comes  into  play  so  as  to  occupy  the  empty 
place  of  the  cell  that  has  just  been  "vacated.”  This  movement  state,  represented 
here  by  an  m-state,  exists  within  an  organism  only  while  the  movement  is  taking 
place,  disappearing  as  soon  as  the  organism  stops.  Figures  3  and  4  show  organisms 
moving  respectively  to  the  left  and  diagonally,  illustrating  the  action  of  the  in¬ 
state.  Although  the  figures  show  situations  in  which  the  organisms  started  their 
movement  in  one  of  the  two  possible  directions  and  carried  on  in  that  direction 
through  subsequent  steps,  in  typical  situations  the  organisms  move  in  a  composition 
of  both. 

Before  starting  a  movement,  the  organism  first  “senses”  a  mostly  quiescent 
neighborhood  ahead  in  order  to  “check”  whether  the  way  ahead  is  “free.”  If  that  is 
the  case,  then  it  “casts”  a  movement  state  zilong  the  available  direction,  “trying"  to 
start  the  movement.  This  situation  can  be  seen  in  Figure  3  during  the  transitions 
from  time  to  to  ti  and  from  to  to  <3,  and  also  in  Figure  4  during  the  transitions 
from  time  <2  to  <3  and  from  <4  to  <5.  If  only  one  direction  is  available,  only  one 
movement  state  is  cast,  and  the  organism  just  moves  in  that  direction.  On  the 
other  hand,  if  both  directions  are  available,  a  random  choice  is  made  among  them, 
but  also  including  the  possibility  that  the  organism  just  does  not  move,  by  simply 
“withdrawing”  both  movement  states. 

If  an  organism  could  not  carry  on  its  movement  because  of  some  obstacle  in 
its  way  ahead,  soon  after  all  its  m-states  disappeared  its  body  would  remain  in  a 
position  determined  by  the  path  it  went  through.  To  compensate  for  that,  we  allow 
an  additional  kind  of  movement  which  is  an  upward  movement  of  the  body,  whose 
effect  is,  whenever  possible,  to  set  the  body  in  the  horizontal  position:  Figure  5 
shows  one  such  situation.  As  it  will  be  clearer  in  subsection  2.5,  the  body’s  upward 
movement  is  relevant  for  the  reproduction  process,  since  it  allows  the  increase  of 
the  rate  of  preservation  of  (noii-ucleteriou®)  parental  gene  configurations  in  the 
offspring;  in  other  words,  its  effect  is  to  decrease  the  randomness  associated  with  the 
process.  But  independently  of  this  justification,  the  upward  movement  is  interesting 
in  itself  due  to  the  extra  “realism”  that  it  adds  to  the  activity  of  the  organisms 
without  the  need  of  any  extra  state.  ” 
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FIGURE  3  Succession  of  snapshots 
of  the  same  set  of  cells  as  a  2-gene- 
long  organism  moves  2  cells  leftwards 
in  successive  iterations.  The  dots 
represent  the  quiescent  state. 


FIGURE  4  Successive  snapshots  of  the  same  set  of  cells  as  a  2-gene-long  organism 
moves  3  ceils  diagonally,  from  a  horizontal  initial  position.  The  dots  represent  the 
quiescent  state. 


Note  that,  as  a  consequence  of  the  three  movements,  the  actual  movement  of 
the  organisms  is  typically  a  composition  of  all  of  them,  with  different  organism's 
cells  moving  in  any  of  the  three  directions  at  the  same  time.  The  overall  spatial 
disposition  of  the  genomes  in  the  cell  space  of  the  automaton  is  then  always  mono- 
tonically  descending  from  the  left,  and,  it  is  tempting  to  say,  in  a  irorm-like  fashion. 
The  complete  list  of  the  state  transitions  for  movement  can  be  found  in  Appendix  A. 


FIGURE  5  Subsequent  snapshots  of  the  same  set  of  cells  showing  the  body 
adjustment  of  a  3-gene-long  organism,  from  an  arbitrary  initial  position.  The  dots 
represent  the  quiescent  state. 


2.4  SELECTION 

Selection  takes  place  in  the  following  way:  if  for  some  reason  the  state  of  a  gene  or  a 
terminal  cell  changes  to  the  quiescent  state  in  a  mostly  quiescent  neighborhood,  the 
entire  organism  vanishes;  the  process  occurs  in  a  stepwise  way,  during  the  next  set 
of  iterations  of  the  automaton.  This  feature  is  equivalent  to  saying  that  organisms 
which  lose  (at  least)  one  terminal  state  and/or  one  estate  are  not  considered  to 
be  proper,  well-formed  organisms  ajid  then  must  die  out.  Appendix  B  presents  the 
complete  list  of  state  transitions  for  selection;  it  is  worth  noting  there  what  we 
mean  here  by  a  “mostly  quiescent  neighborhood.” 

As  far  as  applications  are  concerned,  it  is  necessary  to  design  appropriate  tran¬ 
sition  rules  whose  actions  impose  quiescence  on  at  least  one  terminal  or  estate  in  a 
particular  neighborhood;  ais  soon  as  this  organism  happens  to  be  in  a  mostly  quies¬ 
cent  neighborhood,  it  will  eventually  die  out.  For  example,  suppose  one  wishes  all 
non- homogeneous  genomes  (i.e.,  genomes  presenting  two  or  more  different  types  of 
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genes)  to  die  out.  In  this  case,  a  rule  has  to  be  added  to  the  system  so  as  to  detect 
the  presence  of  two  different  neighboring  genes  in  the  same  organisms,  so  that,  in 
this  situation,  the  gene  in  the  center  cell  of  the  neighborhood  is  deleted,  i.e.,  that 
cell  becomes  quiescent.  As  a  consequence,  as  soon  as  newborn  genomes  with  this 
deleterious  feature  happened  to  be  in  a  mostly  quiescent  neighborhood  they  would 
die  out. 

An  important  point  here  is  that  selection  as  described  above  is  not  adaptaiion- 
tst,  i.e.  the  organisms  are  not  selected  for  by  some  concept  of  fitness.  The  emphcisis 
is  not  on  preserving  the  fitter  genomes,  but  on  killing  off  the  ones  which  have  dele¬ 
terious  features  in  a  particular  situation.  The  emphasis  thus  is  on  concepts  such  as 
viability  rather  than  fitness,  and  evolution  by  satisfying  world  constraints,  rather 
than  evolution  towards  solving  predefined  problems  posed  by  the  world.  This  way  of 
looking  at  selection,  exaptationtsm  (a  contrzu:tion  for  extra-adaptationism)  is  due 
to  Gould  and  Vrba®  and  has  increasingly  gained  support  in  evolutionary  theory 
in  recent  years  (see  also  Gould  and  Lewontin®  and  Piatelli-Palmarini'®).  Exap- 
tationism  is  a  generalization  of  traditional  Darwiniam  adaptationism  rather  than 
an  opposition  to  it,  and  its  support  has  been  due  to  the  fact  that  exaptationist 
explanations  in  evolutionary  theory  have  allowed  clearer  accounts  of  a  number  of 
genomal  changes  that  are  neutral  in  terms  of  their  adaptive  value  but  that  are  se¬ 
lected  nonetheless.  In  order  to  keep  coherence  with  the  exaptationist  standpoint, 
we  should  replace  the  concept  of  a  “useful”  building  block  for  a  non- deleterious  one. 
In  the  current  approach  what  is  guaranteed  is  that  any  organism  that  is  selected 
has  some  non-deleterious  building  block,  even  though  it  may  be  useless  (note  the 
contrast  with  the  traditional  parlance  within  the  context  of  standard  genetic  search 
methods). 


2.5  REPRODUCTION 

Two  organisms  with  any  length  will  mate  if  they  align  their  heads  and  their  first 
gene,  leaving  a  layer  of  quiescent  states  in  between;  the  first  state  transition  de¬ 
picted  in  Appendix  C  clarifies  this  situation  (the  rest  of  the  Appendix  shows  all  the 
other  transitions  involved  in  reproduction).  In  the  mating  configuration,  one  of  the 
parental  organisms  is  on  top  of  the  quiescent  layer  and  the  other  below,  their  heads 
being  in  the  same  column  of  the  cellular  space.  Reproduction  then  goes  on  so  that 
the  new  organism  is  produced  in  the  quiescent  layer,  starting  from  the  matching 
heauls  and  stretching  to  the  right.  Born  this  way,  the  length  of  the  newborn  genome 
is  never  more  than  one  gene  longer  than  the  length  of  its  longest  parental  genome. 
Just  after  reproduction  starts,  as  soon  as  the  paicn*  on  the  top  find  its  way  ahead 
“free,”  it  restarts  its  movement;  immediately  after  the  way  ahead  is  free  for  the 
newborn  it  too  moves,  even  if  its  reproduction  has  not  yet  finished.  Finally,  the 
same  thing  happens  to  the  parent  on  the  bottom. 

The  ceils  of  the  newborn  are  created  one  at  a  time,  both  the  genes  and  the 
terminal  states.  There  are  four  basic  classes  of  state  transitions  for  reproduction: 
deterministic  rules  leading  to  a  T-state,  non-deterministic  rules  leading  only  to  a 
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p-state  or  only  to  a  T-state,  and  further  non-deterministic  ones  leading  to  either 
of  them.  Reproduction  starts  by  creating  a  head  for  the  newborn  whenever  the 
situation  described  above  takes  place  (although  the  actual  7"-state  used  is  randomly 
chosen  among  the  parental  ones).  Then  it  proceeds  in  a  non-deterministic  fashion 
by  creating  its  genes.  Finally,  it  creates  the  newborn’s  tail  in  a  non-deterministic 
way,  unless  one  of  the  following  happens:  first,  the  newborn  has  "moved  too  much” 
even  before  completely  born  (i.e.,  an  m-state  reached  its  right-hand  extremity,  as 
the  last  transition  in  Appendix  C  shows),  or  second,  there  is  no  more  possibility  for 
the  newborn  to  acquire  a  gene  from  its  parents  (as  shown  in  transitions  6,  7.  and 
8). 

The  fundamental  point  about  reproduction  is  that  it  must  be  able  to  provide 
variability  without  being  disruptive;  i.e.,  it  should  allow  for  the  preservation  of 
the  non-deleterious  configurations  of  genes  already  existing  in  the  neighborhood; 
in  other  words,  the  viable  building  blocks  within  the  neighborhood  should  be  pre¬ 
served.  Since  in  the  current  approach  any  genome  that  is  able  to  exist  in  the  cellular 
space  has  some  viable  building  block,  what  we  have  to  do  is  to  allow  the  probabil¬ 
ity  distribution  of  the  non-deterministic  rules  to  favor  the  reappearance  of  building 
blocks  of  the  parental  genomes,  which  are  defined  in  the  newborn  by  its  most  re¬ 
cently  created  cell  and  by  the  cell  that  is  about  to  be  created;  this  is  accomplished 
by  equally  distributing  the.probabiUty  of  the  state  tremsitions  accordingly. 

If  there  are  no  building  blocks  to  be  preserved,  we  just  randomly  choose  any 
of  the  parental  genes  present  in  the  neighborhood.  Because  reproduction  is  not 
prevented  from  taking  place  while  the  parental  genomes  are  in  movement,  it  may 
be  the  case  that  no  parental  gene  is  present  in  a  neighborhood  (see  transition  3  in 
Appendix  C  for  clarification).  In  this  situation,  the  newborn  gene  to  be  created  is 
randomly  chosen  from  all  currently  possible  genes.  It  should  be  mentioned  that,  even 
when  there  are  building  blocks  to  be  preserved  in  the  neighborhood,  a  gene  can  also 
be  created  through  the  latter  process,  thus  giving  a  minimal  uniform  bias  towards 
all  possible  j-states  of  the  application  concerned,  equivalent  to  the  maintenance  of 
a  residual  background  mutation.  The  j*-state  which  appears  in  Appendix  C  refers 
to  a  estate  created  in  the  newborn  in  the  way  we  have  just  described. 

We  can  now  return  to  the  motivation  for  having  the  upward  movement  of  the 
body,  as  mentioned  in  subsection  2.37  According  to  the  preceding  paragraph,  the 
emphasis  of  reproduction  is  on  the  preservation  of  the  parental  building  blocks.  So, 
if  the  organisms  did  not  have  the  upward  movement,  the  chance  that  a  gene  in  the 
newborn  was  created  from  a  neighborhood  with  few  or  no  parental  genes  would  be 
greater.  The  consequence  would  be  that  the  rate  of  preservation  of  viable  parental 
gene  configurations  would  be  smaller.  Then,  as  hinted  at  earlier,  the  exploration  of 
the  search  process  would  be  more  random,  less  oriented  by  the  current  state  of  the 
search. 

Note  that  the  transition  above  is  fairly  complex  by  normal  standards  in  cel¬ 
lular  automata  applications.  It  should  be  clear,  however,  that  our  interest  here  is 
not  on  the  emergence  of  reproduction,  but  on  what  can  be  developed  assuming 
reproduction  as  a  primitive  we  can  rely  on,  and  to  a  certain  extent,  manipulate. 
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3.  DISCUSSION 

The  cellular  automaton  described  was  implemented  in  a  Sun  workstation  using 
Cellsim  2.5,  a  public  domain  environment  for  cellular  automata  experimentsl^b  the 
current  implementation  supports  up  to  256  different  states  in  the  world.  Although 
the  movement  and  the  quiescent  states  are  implemented  as  one  state  each,  the 
terminal  and  the  gene  states  are  defined  as  ranges  of  state  values  specified  by  the 
user.  The  latter  is  important  because  it  allows  for  the  introduction  of  new  features 
in  the  framework  without  necessarily  creating  conflicts  with  the  existing  transition; 
for  example,  it  would  be  possible  to  add  new  kinds  of  heads,  each  of  them  with 
distinctive  properties  (we  return  to  this  point  later  on  in  this  Section).  Another 
feature  of  the  implementation  is  that  it  is  possible  to  control  the  non-determinism 
of  the  transitions  by  means  of  a  set  of  parameters  whose  values  are  decided  by  the 
user;  for  example,  it  is  possible  to  control  the  “amount”  of  each  kind  of  movement, 
the  rate  of  background  mutation,  etc. 

In  running  experiments,  even  though  selection  is  killing  off  organisms  all  the 
time,  because  the  cellular  space  is  finite  sooner  or  later  it  gets  overpopulated.  As  a 
consequence,  we  experience  a  crowding  effect  which  implies  that,  after  some  degree 
of  crowding  is  achieved,  it  becomes  less  likely  that  a  reproduction  involving  long 
parents  will  be  able  to  produce  a  similarly  long  offspring.  The  point  is  that  less 
and  less  quiescent  cells  become  available  and  so.  once  reproduction  starts,  it  is  nor¬ 
mally  curbed  by  a  moving  organism  that  gets  into  the  quiescent  layer  in  which  the 
newborn  is  being  created.  But  then  the  parental  organisms  start  moving  again,  and 
similarly  the  newborn;  as  soon  as  the  newborn’s  last  gene  also  moves,  reproduction 
necessarily  stops,  as  mentioned  earlier.  The  effect  then  is  that,  as  the  cellular  space 
gets  more  and  more  crowded,  an  increasing  bias  towards  shorter  length  genomes 
takes  place. 

Note  however  that  the  real  agent  of  the  bias  is  the  transition  (the  last  one  in 
Appendix  C)  that  adds  the  tail  to  the  newborn  as  soon  as  it  moves;  in  other  words, 
there  is  an  intrinsic  selective  pressure  defined  by  the  rule.  It  is  worth  observing 
that  the  crowding  effect  is  due  to  the  global  behavior  of  the  automaton,  which 
“amplifies”  the  selective  pressure  already  implicit  in  the  rule.  One  way  to  minimize 
such  an  effect  is  to  allow  a  background  selective  process  which  would  randomly  set 
cells  to  the  quiescent  state.  This  can  be  done  by  just  adding  a  non-deterministic  rule 
that  leads  to  quiescence  with  a  small  probability,  which  would  have  to  be  worked 
out  empirically,  according  to  the  domain  concerned,  as  well  as  to  the  size  of  the 
cellular  space  being  used. 

The  studies  on  cellular  automata  dynamics  presented  in  Langton^^  suggest  that, 
as  far  as  the  emergence  of  computation  and  life  in  natural  and  artificiaJ  systems  is 
concerned,  the  “interesting”  dynamics  lies  between  order  and  disorder.  Although 
the  characterization  of  these  dynamic  regimes  is  not  precise,  there  are  some  recur¬ 
ring  patterns  that  have  been  accepted  as  necessary,  such  as  the  existence  of  very 


1*1  The  C  code  that  implements  the  automaton’s  state  transitions  is  available  from  the  author. 
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long  transients,  dependence  on  the  size  of  the  cellular  space,  high  (but  not  maxi- 
meil)  temporal  and  spatial  correlation  between  the  cell  states,  and  the  existence  of 
propagating  structures.  It  happens  that,  provided  that  the  overpopulation  of  the 
cell  space  is  avoided,  all  these  features  have  been  captured  in  the  cellular  automaton 
described  without  having  them  as  design  constraints. 

Although  we  are  aware  of  the  biological  implausibility  of  the  framework  as 
it  stands,  there  are  a  number  of  features  that  can  be  easily  altered  or  added  so 
that  richer  frameworks  can  be  built,  which  could  lead  to  models  of  some  aspect 
of  biological  life  as  well  as  testbeds  for  artificial  life.  For  example,  mating  here 
is,  in  principle,  a  matter  of  chance,  not  being  driven  by  any  characteristics  of  the 
domain  (such  as  fitness).  However,  if  one  wishes  to  impose  some  selective  mating 
among  the  orgeinisms,  it  is  enough  to  write  a  state  transition,  similar  to  the  first 
one  in  Appendix  C,  with  the  difference  that  it  would  contain  the  specific  parental 
genes  that  would  allow  reproduction  to  start.  By  placing  this  new  transition  before 
the  equivalent,  more  general  one,  in  the  actual  code  of  the  automaton,  the  former 
would  prevail  over  the  latter  without  bringing  any  contradiction  to  the  system’s 
behavior.  It  should  be  clear  that  this  example  is  absolutely  general  for  any  other 
aspect  that  one  wishes  to  embed  in  the  cellular  automaton,  and  is  indeed  a  central 
issue  on  the  “programmability”  of  the  framework;  all  that  is  needed  is  to  satisfy 
the  set  of  “hardwired”  constraints  defined  by  the  existing  state  transitions  and  the 
kinds  of  states  they  involve.  We  refer  to  this  important  feature  as  the  addition  of 
instantiated  transitions. 

Through  the  same  kind  of  reasoning,  it  would  be  very  simple  to  allow  the 
terminal  states  to  be  represented  by  distinct  head  auid  tail  states.  It  is  possible  to 
go  even  further  so  as  to  allow  the  existence  of  different  kinds  of  heads,  which  could 
be  associated  with  the  feature  of  specialization  towards  either  of  the  directions 
of  movement.  A  natural  consequence  would  then  be  the  addition  of  insteintiated 
transitions  to  start  reproduction  so  as  to  allow  the  movement  specialization  to  be 
passed  on  to  the  newborn,  according  to  various  possible  schemes,  such  as  that  the 
newborn  of  parents  specialized  in  moving  in  the  same  direction  would  be  more  likely 
to  move  in  that  direction. 

As  far  as  reproduction  is  concerned,  one  could  think  of  adding  instantiated  traui- 
sitions  that  would  change  the  distribution  of  probability  of  the  non-deterministic 
transition  rules  so  as  to  change  the  current  bias  towards  the  formation  of  building 
blocks  according  to  some  weighted,  domain-dependent  function  of  the  number  of 
building  blocks  that  each  candidate  state  defines.  A  trivial  example  would  be  just 
a  weighted  distribution  according  to  the  number  of  building  blocks  associated  to 
each  candidate  state. 
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4.  CONCLUSIONS  AND  PROSPECTS 

The  primary  intention  of  this  paper  is  to  show  a  particular,  non-deterministic  cel¬ 
lular  automaton  that  is  able  to  embed  genetic  search.  We  pointed  out  that  this 
automaton  is  just  a  member  of  a  family,  and  also  showed  how  its  spatce  of  possible 
extensions  can  be  explored.  The  framework  provides  flexibility  to  embed  a  number 
of  features  that  could  be  used  primarily  for  artificial-life  experiments,  and  perhaps, 
also  for  some  aspects  of  biological  modeling.  Then  we  showed  that  cellular  automata 
can  provide  a  distinctive  framework  to  embed  genetic  search,  which  is  meant  here 
as  a  technique  to  explore  a  search  space  by  meauis  of  non-deterministic  mechanisms 
that  provide  variability  and  selection  regarding  the  points  of  the  space,  in  sucL  a 
way  that  viable  building  blocks  tire  created  and  built  upon.  As  far  as  genetic  search 
is  concerned,  the  discussions  were  generail,  no  attempt  having  been  made  to  discuss 
any  particular  application  in  detail. 

We  stress  that  the  possibility  of  defining  the  automaton  in  terms  of  the  four 
primitive,  “hard  wired”  concept  states  was  an  essenticU  achievement,  since  the  ge¬ 
netic  search  becomes  dependent  on  this  small  set  of  state  categories,  ultimately 
rendering  the  programmability  of  the  framework  fairly  simple,  as  we  tried  to  show. 
As  far  as  the  general  issue  of  cellular  automata  to  embed  genetic  search  is  concerned, 
further  developments  carfl)e  directed  to  einy  aspect  of  their  definition,  bearing  in 
mind  for  example,  that  only  one  (topological)  species  can  exist  in  the  automaton 
described. 

As  far  as  the  artificial-life  world  embedded  in  the  automaton  is  concerned,  its 
major  drawback  has  to  do  with  the  provision  of  interaction  between  organisms, 
which  is  currently  very  poor.  Note  that  the  only  kinds  of  interaction  provided  are 
reproduction,  and  the  ones  derived  from  movement,  as  when  an  organism  is  in  the 
way  of  another.  However,  a  neat  solution  for  this  problem  exists  and  is  currently 
being  worked  out.l®!  The  definition  of  regions  in  the  world  composed  of  a  new  class 
of  environmental  J5-states  that  could  be  “touched”  by  the  organisms  and  resulting 
in  mutual  state  modification  would  certainly  solve  the  problem.  The  interactions 
among  the  organisms  would  then  be  made  through  the  environmental  states  with 
virtually  unbounded  richness. 

Another  extension  that  is  also  being  worked  out  refers  to  the  introduction  of  the 
concept  of  an  intermediate  state,  which  would  aJlow  the  organism  to  have  two  halves: 
the  first  half,  representing  the  genotype  as  discussed  in  this  paper,  and  the  second 
half,  representing  the  phenotype.  The  idea  is  that  a  newborn  will  be  subjected  to 
a  developmental  phase  before  it  is  fully  created  (in  an  egg-like  fashion).  So,  after 
the  reproductive  process  has  created  the  newborn  with  only  its  genotype,  as  soon 
as  its  top  parent  leaves,  a  developmental  process  starts  leading  to  the  creation  of 
the  second  half,  where  the  genotype  will  be  expressed. 

The  use  of  genetic  search  in  cellular  automata  demands  that  a  question  being 
addressed  be  subjected  to  a  formulation  based  on  local  constraints.  This  may  be 

WThis  is  one  of  the  topics  in  a  forthcoming  paper.® 
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difficult  in  a  number  of  situations.  Another  source  of  difficulty  that  might  even 
preclude  particular  applications  is  that  the  sort  of  framework  we  discussed  usually 
demands  a  great  deal  of  computational  power. 

The  clear  concepts  of  space  and  time  that  cellular  automata  embed,  the  strong, 
even  fanciful  sense  of  realism  they  get  across,  and  their  ability  to  support  unification 
between  operand  and  operator  are  just  some  of  features  they  intrinsically  carry 
with  them,  which  are  desired  in  artificial-life  studies  in  general,  as  discussed  in 
Langton.^^  As  for  aspects  particularly  relevant  to  genetic  search,  we  can  identify 
the  fact  that  the  state  evolution  of  a  cellular  automaton  according  to  local  nonlinear 
rules  provides  a  neat  way  to  model  phenotypic  expression;  the  issue  here  is  that 
there  is  a  great  deal  of  work  done  in  cellular  automata  so  that  we  can  avoid  an 
ad  hoc  dynamics,  which  would  be  unsound  from  a  theoretical  point  of  view,  and 
whose  analysis  might  be  very  difficult  to  perform.  In  addition,  because  within  the 
context  of  cellular  automata  there  is  this  well-defined  concept  of  an  underlying 
physics,  it  becomes  natural  to  think  of  a  unified  process  supporting  the  existence  of 
both  genotype  and  phenotype.  A  step  further  is  the  actual  unification  of  evolution, 
development,  and  interactions  of  the  organisms  with  their  environment,  which  is  in 
fact  the  direction  we  are  currently  pursuing. 

We  have  tried  to  make  the  point  about  the  applicability  of  the  fraunework  in 
addressing  specific  probleras-in  artificial  life;  the  question  remains  though,  as  to  the 
effectiveness  of  the  approach,  madnly  because  the  latter  is  beyond  the  scope  of  this 
paper.  I  believe  however,  that  the  most  appropriate  kinds  of  questions  that  should 
be  addressed  from  the  perspective  we  introduced  have  to  do  with  using  the  organ¬ 
isms  as  probes  into  the  emergence  and  self-organization  of  evolving  systems  that 
are  not  subjected  to  solving  particular  problems.  What  we  have  in  mind  here  is  the 
notion  that  in  nature  there  are  no  problems  being  solved,  but  evolutionary  paths 
being  followed  according  to  the  constraints  existing  at  each  time  (see  Varela'^).  I 
think  that  such  an  appeal  comes  from  two  sources:  first,  the  fact  that  cellular  au¬ 
tomata  constitute  a  paradigmatic  model  for  emergence,  and  second,  the  increasing 
support  that  the  role  of  self-organization  in  the  origins  of  order  in  evolutionl^l  has 
received  recently,  for  example,  as  in  Kauffman.^ 

The  exaptationist  claim  for  looking  at  evolution  from  the  point  of  view  of 
constraint  satisfaction  clearly  fits  into  fhe  picture  formed  by  the  issues  above.  Now, 
even  though  one  could  also  identify  those  ideas  with  adaptationism,  the  constraint 
that  locality  implies  for  selection  in  cellular  automata  seems  to  be  much  more  in 
tune  with  the  exaptationist  standpoint.  One  might  argue  however,  that  since  the 
notion  of  fitness  function  can  certainly  be  interpreted  either  as  a  constraint  to  be 
satisfied,  or  as  a  specification  of  an  evolutionary  path  to  be  followed,  adaptationism 
and  exaptationism  are  equivalent  from  an  implementational  point  of  view,  and,  as  a 
consequence,  the  difference  between  them  is  “just”  a  matter  of  point  of  view  of  the 
experimenter.  Although  agreeing  with  the  premises,  I  reject  the  conclusion  drawn 
from  it.  I  believe  that  the  latter  is  exactly  the  crucied  distinction  between  the  two:  if 
the  issue  at  stake  is  self-organization,  where  the  emphasis  is  on  the  ongoing  process 


I^IWe  could  even  say,  order  tn  spite  o/selection  as  contrasted  to  due  to  it. 
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rather  than  on  the  endpoint  reached,  the  least  biased  and  consequently,  the  most 
natural  standpoint  to  take  seems  to  be  exaptationism.  It  is  interesting  to  observe 
that,  although  this  point  of  view  is  shared  among  many  current  evolutionists,  it 
does  not  seem  to  be  the  case  for  many  practitioners,  say,  in  the  genetic  algorithm 
community,  where  adaptationism  clearly  prevails. 
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APPENDIX 

These  Appendices  present  the  complete  list  of  state  transition  for  the  cellular 
automaton  discussed.  Unless  otherwise  stated,  all  the  rules  not  shown  are  supposed 
to  preserve  the  state  of  the  centre  cell  (this  applies  in  particular  to  the  quiescent 
rule,  which  preserves  a  <?-state  when  all  the  surrounding  cells  are  also  0).  For  all 
the  Appendices  the  following  holds: 

■  The  symbol  #  is  a  don't  care  referring  to  either  of  the  following  states;  T,  y,  or 
0. 

m  When  g/  T  appears  in  a  neighborhood,  it  means  that  the  corresponding  cell 
can  take  on  either  of  the  two  states.  In  addition,  if  the  state  transition  also 
leaids  to  g/  T,  the  new  state  will  follow  the  one  that  actually  appears  in  the 
neighborhood. 

■  When  more  than  one  estate  appears  in  the  neighborhood,  no  distinction  is 
made  between  them,  independently  of  their  being  equal  or  different  to  each 
other.  Any  case  of  ambiguity  about  which  instate  of  the  neighborhood  the 
transition  leads  to,  is  solved  by  subscripting  the  j-state  by  its  geographic  loca¬ 
tion  in  the  neighborhood  (according  to  Figure  1).  Equivalent  rationale  applies 
for  the  neighborhoods  which  have  more  than  one  cell  in  a  T-state  or  in  a  state 
represented  by  #. 
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■  The  number  of  a  transition  we  sometimes  mention  in  the  text,  refers  to  its 
position  from  the  left  to  the  right,  and  from  the  top  to  the  bottom. 

■  The  transitions  characterized  by  the  symbol  ^  are  non-deterministic;  the  ones 
with  are  deterministic. 

■  The  neighborhoods  showing  both  T-states  of  the  same  organism  are  due  to  the 
smallest  well  formed  organism  which  has  3  cells. 


A.  STATE  TRANSITIONS  FOR  MOVEMENT 
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B.  STATE  TRANSITIONS  FOR  SELECTION 
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C.  STATE  TRANSITIONS  FOR  REPRODUCTION 


■  The  state  g'  means  a  estate  that  is  non-deterministically  generated  according 
to  the  explanation  in  subsection  2.5. 

■  The  index  min  used  in  some  of  the  terminal  states  is  just  an  implementation 
detail  that  defines  the  default  terminal  state  used. 
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Some  Mathematical  Results  on  the  NK  Model 


The  NK  model  is  a  simple  biological  model;  however,  it  is  hard  to  analyze 
mathematically  in  detail,  ('xcept  by  simulation.  In  this  paper,  a  relation  between 
the  NK  model  and  the  spin  model  is  given  in  an  explicit  form.  Furthermore,  by 
calculating  the  correlation  H'(r/)  and  a  rigorous  meaning  of  the  ruggedness 

of  the  landscape  is  presented  in  a  suitable  assumption. 


1.  INTRODUCTION 

We  will  consider  the  NK  model  which  was  introduced  by  Kauffman."  First,  we 
present  the  rigorous  definition  of  the  NK  model.  Let  N  be  a  positive  integer  and 
K  be  a  non-negative  integer.  The  NK  model  is  based  on  only  two  alleles  at  each  of 
the  V  genetic  loci.  In  general,  the  number  of  alleles  at  each  locus  can  be  extended 
.4e{2.3,  ). 

Let  77  denote  a  configuration  of  genotype,  i.e.,  t]  =  ■  ■  ■ .  t]{i\))  with  77(1)  E 

{0, 1}  {i  =  1,  •  •  • ,  N).  X  =  {0, 1}^  is  a  configuration  space  of  the  genotype  with  N 
loci.  Each  genetic  locus,  i,  hcis  epistatic  interactions  from  K  other  loci,  {ii,  •  ■  if(} 
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The  configuration  space  of  the  above  2^"^^  contributions  of  alleles  is  defined  by 
A'/c+i  =  {0,  It  is  noted  that  0  <  K  <  N  — 

The  fitness  contribution  of  each  locus.  ri(i\),  •  •  ■ ,  T}{ifc)),  is  specified 

by  the  configuration  of  the  alleles  of  the  K  +  1  loci,  Ai(  A')  =  {i,  A ,  •  •  .  Ik}- 

Remark  that  :  i  =  1 ,  •  ■  ■ .  A,  (7j(/)  :  /  G  A, (A'))  6 

AV+i }  is  the  collection  of  iV  x  random  values.  Then,  for  each  genotype  ?/  G  A', 
the  fitness  of  genotype,  W{r)),  is  defined  as  the  average  of  the  fitness  contribution 
of  each  locus: 

1  ^ 

»=i 

The  above-mentioned  model  is  called  the  NK  model. 

The  one-mutant  neighbor  r/j  G  =  1,  •  ■ ,  .V)  with  respect  to  r;  G  A'  is  given 
by;  t?i(0  =  r}{i)  if  i  #  j  and  T)jij)  -  1  -  T){j). 

In  this  paper.  Section  2  will  give  a  relation  between  the  NK  model  and  the 
spin  model.  Next,  we  will  obtain  a  rigorous  meaning  of  the  ruggedness  of  fitness 
landscape  of  the  NK  model  in  Section  3.  Finally,  Section  4  is  devoted  to  summary 
and  discussions. 


2.  RELATION  BETWEEN  THE  NK  MODEL  AND  THE  SPIN 
MODEL 

Let  f2  =  {  —  1, 1}^  and  Qk+i  =  {  — F  Following  Palmer,^  for  each  spin  con¬ 

figuration  5  G  r2,  we  define  the  fitness  function.  F(S),  as  a  sum  of  N  contributions, 
with  the  ith  contribution  depending  on  S{i)  and  A'  other  5(;)’s; 

N 

F(S)  =  S(i:  ),■■■,  S{iK)).  (-2.1) 

»=i 

Noting  that  from  the  following  basic  relation 

S(i)  =  2r,(,)‘'-l(i=  (2.2) 

it  is  easily  obtained 

F(5)  =  W{r]),  (2.3a) 

and 

/^(''HS(i),5(iO,--.5(.K))  =  ^VVf''Hr7(0,r7(iO,--,r?(iK)).  (2  36) 

In  a  similar  fashion,  the  one- mutant  neighbor  Sj  G  €  {1,  •  •  • ,  A^})  with  respect 
to  5  G  n  is  given  by;  Sj{i)  =  S(i)  if  i  ^  j  and  Sj{j)  =  ~S(j). 

Each  F^^\S(i),S{ii),  -  ■  ■  ,S{if())  takes  2^"^^  values,  then  we  have  the  next 
representation  of  it  by  simple  caJculation. 
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THEOREM  2.1  In  the  NK  model, 

=  <f(0+  E  4,f’(-p,)S(<p,)  +  - ■■ 

pi=0 

+  ^(*Pl  ’  ■  ■  ■  ’  )  ■  ■  ■  d"  ■  ■  ■ 

Pi.  ,Pre{0,  ,/C},(pi ,  ,pr)distinct 

+  •  •  ■  ,lA-).5(jo)5(ll)  •  -  SilK)  , 

where  I'o  =  i  and 

•^i.r  ^(*Pi  1  ■  ■  ■  >  ^Pr)  ~ 

5(io)=±l,  •  .5(tK  )=±1 


Then  J,' ,  •  •  • ,  ip,.*)  may  be  considered  as  a  random  coefficient  of  r  points 
correlation  of  {5(ip,),  •  •  •  ,5(ip,)}.  For  example,  we  consider  K  =  1.  In  this  case, 
for  each  t  =  I,  -  ■■  ,N  and  i  ^  j,  we  can  write 


Fi"{sii).  SU))  =  +  4‘,'(i)S(i)  +  4‘.’(j)S(i)  +  J'.‘2(iJ)Sii)S(j). 


where 


4'’(i)  =2? 


52  f;‘'’(s(i).s(;)). 


i(«)=±l.>(>)=±l 


E  S{i)F^'\Sii),S{j)), 


i(i)=±l.i(j)=±l 


i(.)=±i.»0)=±i 


E  s(i)50)f;'‘'(s(i),50)). 


i(i)=±l.^(j)=±l 


Remark  that  the  relation  (2.4)  implies  that  J,- ^  ^(ip, ,  •  •  •  ip,)  and  m  (*pi  ’  '  ’ 

ip^)  axe  independent  for  any  distinct  pair  (r, m),  even  if  x 
(5(*).'S’(ii),  ■  •  •  ,S{if())}  random  variables  are  independent,  identically  distributed 
(IID). 
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3.  RUGGED  LANDSCAPE  OF  THE  NK  MODEL 

In  this  section,  we  show  a  rigorous  result  corresponding  to  the  ruggedness  of  the 
fitness  landscape. 

THEOREM  3.1  Assume  that  N  x  :i= 

{S(i),  S{ii),  ■  ■  ■ ,  S(iii))  G  random  variables  are  HD  with  mean  m{F)  and 

variance  v(F).  Then 

E[FiS)F{S')]  =  N^m{Ff  +  {N  -  {K  +  l)}t;(F),  (3.1) 

for  any  S,S'  G  Q,  where  S'  is  one-mutant  neighbor  of  5. 

PROOF  Since  :  i  =  1 ,  •  •  ■ ,  iV,  (5(i),  5(ii , 

S(iK))  G  fiAT+i}  is  a  collection  of  HD  random  variables,  we  have 


N 


£:[f(Sf]=f:  :A,iK))} 

i  =  l 

»  =  1 

+  Y^E  :  A.(A'))]  E  :  A, (A’)) 


=Nv{F)  +  N~Tn{F)^. 


where  F,^^\S  :  A,(/T))  =  F,^^\S(i),  S{i,  5(f/c))  for  *  =  1,  •  ■  ■ ,  A. 

On  the  other  hand,  there  is  an  /  €  {1,  ■  •  • ,  N)  such  that  S’  =  5/.  Then,  in  a 
similar  fashion,  we  get 


£[{F(S')-F(S)}' 


=  E 


N 


{Y,{Fi''Hs,:A,iK))  -  Fl^\S:Ai{K))}y 


i=l 


+  12  E 

i,j  i^}.l^£i,(K),A,(K) 

X  E 


=  E  A.(A'))  -  Fl'^^(S:A,(K))y 

Ff''^(Si:Ai(K))  -  F^^^(S:A,(A'))] 


Fj'^^(Sr.Aj(K))  -  F}'^\S:Aj(K))j 
=  2(K -h  l)v(F) . 
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Therefore, 

E[F(S')FiS)]  +  F[F{Sf]  -  E[{F(5')  -  F(F)}-]) 

=  N^miF)-  +  {-V  -  ( A'  +  l)}v{F). 

Note  that  Theorem  3.1  is  also  obtained  by  using  Theorem  2.1:  however,  the  proof 
is  more  complicated  than  the  above  one. 

Furthermore,  the  following  result  is  easily  derived  from  relations  (2.2)  and  (2.3), 
and  Theorem  3. 1 . 

COROLLARY  3,2  Assume  that  A  x  2^'+‘{lT/^Vr7(0,  (?/( A ),---,  r?(2A'))  :  i  = 
I,  -,  A,  (7j(i),77(ii),---,r/(z;,'))  6  A^a-+i)  random  variables  are  IID  with  mean 
m(l'T)(=  i^m(F))  and  variance  t’(lT)  =  -^i'(F)).  Then 

F[W{ri)W{ri,)]  =  m(Wf  +  (3.2) 

for  any  rj,  rji  ^  X ,  where  rji  is  one-mutant  neighbor  of  t]. 

For  example.  A,(/\  )  =-{i  —  /,•••,  L  •••,  j  -)-  r}  with  /  -f-  r  =  F,  /  >  0.  r  >  0  and 
periodic  boundary  condition  is  one  of  the  typical  cases  of  above-mentioned  results. 

Equation  (3.2)  implies  that  K  increases  from  0  to  A  —  1,  then  the  correla¬ 
tion  of  iy(j7/)  and  W(r})  decreases  monotonically.  And  it  corresponds  to  the  fol¬ 
lowing  Kauffmans  statement^:  Increasing  the  richness  of  epistatic  interactions.  /\  , 
increases  the  ruggedness  of  fitness  landscape.  In  particular,  when  l\  =  X  —  1.  Eq. 
(3.2)  is  equal  to 

F[W{rii)Wiv)]  =  F[iy(77/)]F[1V(77)].  (3.3) 

Hence,  it  shows  that  {1^(7?)  :  tj  €  A}  is  the  collection  of  uncorrelated  random 
variables. 


4.  SUMMARY  AND  DISCUSSIONS 

First,  this  paper  presented  rigorous  definition  of  the  NK  model.  Next,  by  using  the 
definition  of  it,  we  got  the  relation  between  the  NK  model  and  the  spin  model.  This 
result  suggests  that  the  NK  model  is  more  difficult  than  the  spin  model  to  analyze 
mathematically.  Finally,  in  the  simple  case,  we  showed  that  if  K  increases  from  0 
to  A  —  1,  then  the  ruggedness  of  fitness  landscape  increases  monotonically  by  the 
direct  computation  of  E\W {ri)W {rfi)].  The  NKC  modeP  is  an  extended  model  of  the 
NK  model  in  order  to  study  coevolutionary  processes.  In  the  NKC  model,  rigorous 
clarification  of  relation  between  ruggedness  of  fitness  landscape  and  the  edge  of 
Chaos'*  is  a  future  interesting  problem.  In  connection  with  it,  various  problems  of 
the  NK/NKC  models  are  discussed  in  Ahouse  et  al.* 
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Complex  Dynamics  of  Flagella 


A  flagellum  swimming  in  a  viscous  medium  is  modeled  by  a  one-dimensional 
array  of  opposed  active  elements.  The  resultant  model  is  mathematically  described 
by  a  fourth-order  partial  differential  equation.  In  the  model,  the  active  element  is 
characterized  by  both  hysteresis  and  excitability  with  respect  to  the  sliding  motion 
between  the  filaments.  Hysteresis  means  that  the  element  is  either  turned  “on”  or 
“off,”  depending  on  the  history  of  the  sliding  motion.  Excitability  is  defined  when 
active  sliding  is  triggered  by  passive  sliding  over  a  threshold.  The  combination 
of  these  properties  leads  to  a  spatio-tempored  sliding  pattern  within  the  flagellar 
system,  which  in  turn  causes  a  bending  pattern.  Numerical  simulations  for  the 
present  model  reveal  that  (i)  intrinsic  instability  arises  from  this  model  system, 
(ii)  the  direction  of  propagating  waves  is  reversed,  (iii)  such  direction-reversing 
propagating  waves  are  replaced  by  unidirectional  waves  after  the  insertion  of  a 
passive  region  at  one  end,  and  (iv)  the  increase  in  the  system  size  leads  to  chaotic 
behavior. 
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FIGURE  1  Propagating  waves 
typical  of  “normal’  flagella. 
Successive  waves  (1  — >  3) 
propagate  toward  the  tip  of  a 
flagellum  as  indicated  by  the  arrow. 


1.  INTRODUCTION 

Flagella  arc  hair-like  projections  which  are  found  on  eukaryotic  cells.1‘1  Their  pri¬ 
mary  function  is  to  move  single  cells  through  a  fluid  for  locomotion.  Most  flagella 
show  regular  base-to-tip  bend  propagation^'  as  illustrated  inFigure  1.  However, 
others  show  quite  complex  dynamical  behavior  such  as  the  reversed  of  the  direc¬ 
tion  of  propagating  waves, collision  of  waves  which  travel  in  the  opposite 
directions,'’'®  intermxttent  movements  with  stopping  and  starting  transients,^  and 
co-existence  of  different  Waves  on  different  sections  of  a  long  insect  flagellum.^® 
Surprisingly,  there  is  no  essential  difference  in  the  structure  of  these  flagella.  The 
problem  is,  thus,  to  clarify  the  underlying  mechemism  leading  to  various  modes  of 
complex  behavior.  Although  many  theoretical  studies  have  been  performed,  they 
have  focused  on  the  regular  base-to-tip  bend  propagation  only.®"'®  No  attempt  has 
been  made  to  understand  the  potentially  important  complex  behavior. 

In  the  present  paper,  I  will  examine  the  above  problem  based  on  recent  theo¬ 
retical  studies.'^"^' 


2.  THE  SLIDING  FILAMENT  MECHANISM 

It  is  now  established  that  bending  waves  in  flagella  are  caused  by  the  sliding  filament 
mechanism.®'’^^’^®  Although  actual  flagella  have  nine  outer  microtubules,®^  they 
are  approximated  by  a  two-filament  system  on  the  assumption  that  bending  occurs 
in  a  single  plane.  As  illustrated  in  Figure  2,  bending  does  not  occur  when  any 
part  of  the  filaments  slides  equally  (Figure  2(B)).  If,  however,  sliding  is  restricted 
on  local  regions,  bending  is  generated  between  the  sliding  and  nonsliding  region 
(Figure  2(C)).  For  such  bending  to  be  reversed,  the  direction  of  sliding  must  be 
reversed  (Figure  2(D)).  The  flagellar  system  is,  thus,  modelled  by  a  one-dimensional 
array  of  opposed  active  elements,  each  of  which  has  its  own  “preferred”  direction. 

b  I  Confusingly,  bacterial  flagella  share  the  same  name  as  those  of  eukaryotes.  They  are,  however, 
completely  different  in  structure  and  function. 
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(A) 


(B) 


(D) 


FIGURE  2  Diagrams  showing  how  sliding  motion  causes  bending  motion  in  a  two- 
filament  system.  (A)  The  flagellum  is  straight  and  no  bending  occurs  without  sliding 
motion.  (B)  No  bending  is  initiated  when  sliding  occurs  equally  throughout  the  length 
of  the  flagellum.  (C)  If  sliding  is  localized,  bending  occurs  between  the  sliding  and 
nonsliding  regions.  (D)  When  the  direction  of  sliding  is  reversed,  the  flagellum  bends 
in  the  direction  opposite  to  the  previous  direction  as  shown  in  (C).  The  arrows  indicate 
the  directions  of  relative  sliding. 


418 


Masatoshi  Murase 


3.  DERIVATION  OF  THE  BASIC  EQUATION 

An  arc  length,  s,  is  introduced  to  measure  the  distance  along  the  flagellum  from 
the  base.  Then,  the  sliding  displacement,  cr,  is  defined  as  a  function  of  time.  t.  and 
space,  s.  Under  the  condition  that  sliding  is  restricted  on  local  regions,  we  can 
assume  that  the  sliding  displacement,  cr,  is  proportional  to  the  bending  angle.  6, 
between  a  horizontal  axis  and  a  line  tangent  to  the  flagellum.  Once  cr  is  specified, 
we  ran  easily  obtain  the  flagellar  shape  by  simple  integration  (cf.  Figure  4).  For 
convenience,  cr  is  defined  as  a  dimensionless  sliding  displacement  and  is  allowed  to 
vary  between  0  and  1. 

The  moment-balance  equation  for  a  flagellum  is  written  by 


Mv  +  A/s  =  Me  =  0 


(1) 


where  A/v,  A/s,  and  Me  are  the  external  viscous,  internal  shear,  and  internal  elastic 
moments,  respectively.  To  obtain  the  basic  equation,  let  us  specify  each  moment  in 
Eq.  (1). 

First,  the  external  viscous  moment.  My,  is  given  by  the  external  viscous  force,* 


Fs-. 


dMy 

ds 


■f  Fv  =  0. 


(2) 


The  external  viscous  force,  Fjy,  in  turn  obeys  the  following  force-balance  equation 


12. 


dFs 

ds 


-b  CsVn  =  0 


(3) 


where  and  VJv  are  normal  components  of  the  external  viscous  drag  coefficient 
and  the  velocity,  respectively.  In  Eqs.  (2)  and  (3),  inertial  terms  are  ignored  because 
the  Reynolds  number  of  flagella  is  extremely  small.  The  normal  component  of  the 
velocity,  Vjy,  is,  then,  specified  under  the  condition  of  continuation: 


dVjsi  _  d(T 
ds  dt 


(4) 


In  Eqs.  (3)  and  (4),  translational  movements  of  the  flagellum  as  a  whole  are  ne¬ 
glected  based  on  the  small-amplitude  assumption.*®  This  simplifies  the  algebra  and 
the  essential  results  should  not  be  affected.^ 

Secondly,  the  internal  shear  moment,  Ms,  is  defined  by  the  internal  shear  force,^ 


S: 

dMs  _ 

ds 


Lastly,  the  internal  elastic  moment,  Me, .is  proportional  to  the  curvature: 


Me  —  Eb 


d<T 

ds 


(6) 
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(A)  (B) 


o 


FIGURE  3  The  cubic  force-distance  and  hysteresis  switching  functions.  (A)  Fj  and 
Fii  are  represented  by  solid,iEind  dotted  lines,  respectively.  They  are  defined  as  a 
function  of  the  sliding  displacement,  <t.  The  force  constant,  Q,  is  taken  as  250  pN. 

(B)  The  binary  function  is  defined  in  the  region  0.2  <  tr  <  0.8.  n/  and  n//  give  either 
the  discrete  values  0  or  1  under  the  condition  ofn/-|-n//  =  l. 


where  Ejg  is  the  bending  resistance. 

Combining  the  above  equations,  we  obtain  the  following  basic  equation: 

^  da-  d^S  _  d^a- 


4.  THE  MODEL 

The  problem  is  how  to  specify  the  internal  shear  force,  S,  in  such  a  way  that  Eq.  (7) 
gives  rise  to  various  modes  of  wave  phenomena.  In  the  present  model,  the  internal 


shear  force,  S,  is  defined  as  follows: 

S  =Fini  +  Ffiriii  —  Kei<r  —  0.5)  —  (8a) 

F/ =Q(<r-0.1)(<T-0.3)(l -a)  (86) 

Fii  =Q(<r  -  0.9)(a-  -  0.7)(-a-)  (8c) 

~  {  0  0  <  s  <  0  8  initially  n/  =  0  for  a  >  0.2)  (8d) 
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r  1  0  <  5  <  0.8 

to  0.8  <  s  <  1 


(if  initially  n/  =  1  for  cr  <  0.8) 


(8e) 


where  Fj  and  Fn  are  two  opposing  force-distance  functions,  nj  and  njj  are  two 
switching  functions, 1^1  /v'e  is  the  force  constant  of  the  passive  elastic  component, 
and  7  is  the  internal  viscous  resistance.  In  the  following  simulations,  7  is  taken  to 
be  zero  except  for  Section  5.1  because  it  is  negligible  in  experimental  conditions. - 
Excitability  is  represented  by  Eqs.  (8b)  and  (8c),  where  Q  is  their  force  constant. 
See  Figure  3(A)  for  details.  Hysteresis  is  represented  by  Eqs.  (8d)  and  (8e).  To  avoid 
the  competition  between  the  twoopposi"-  dements,  it  is  assumed  that  ni  +  nu  =■  1. 
See  Figure  3(B)  for  details. 

Equations  (7)  and  (8)  are  solved  on  the  assumption  that  moments  and  forces 
vanish  at  both  ends.  These  free-end  boundary  conditions  are; 


d(T 

ds 


d-cr 


=  0 


lj=:0,L 


(9) 


where  L  is  a  length  of  a  model  system. 


5.  SIMULATION  RESULTS 

5.1  INTRINSIC  INSTABILITY 

Although  the  internal  viscous  resistance,  7,  has  been  considered  to  be  negligible, 
large  values  of  7  are  empirically  introduced  to  stabilize  the  wavelength  of  simulated 
waves  in  some  models. This  section  investigates  the  effect  of  changing  the  ratio 
between  the  internal  viscous  resistance.  7,  and  the  external  viscous  drag  coefficient, 
Cn,  on  the  stability  of  solutions  to  Eqs.  (7)  and  (8).  For  this  purpose,  three  sets  of 
values  of  7  and  Cn  are  used:  (i)  7  =  50  pNms/24  nm,  =  0;  (ii)  7  =  50  pNms/24 
nm,  Cn  =  0.5  pNms//xm^;  and  (iii)  7  =  0,  Cn  =  5  pNms//im".  A  50-/im-long  model 
flagellum  is  set  to  be  homogeneous  alpng  the  length  of  the  system  except  that  forced 
periodic  oscillations  are  applied  at  one  end  in  order  to  generate  propagating  waves. 

Figure  4  shows  the  simulation  results.  In  each  case,  the  sliding  displacement, 
(T,  is  plotted  against  space,  s,  in  the  left,  and  the  corresponding  bending  pattern  is 
shown  in  the  right.  The  time  interval  between  the  two  successive  patterns  is  5  ms. 
As  the  ratio  of  j/Cn  is  decreased,  the  sliding  pattern  is  deformed  in  two  ways  (see 
left  panels)  though  its  corresponding  bending  pattern  does  not  change  as  much 


I^lSubscripts  I  and  II  indicate  two  subsystems  I  and  II,  respectively. 
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FIGURE  4  The  sliding  displacement,  cr,  as  a  function  of  the  space,  s,  shown  in  the 
left,  and  the  corresponding  bending  pattern  shown  in  the  right.  The  model  flagellum  is 
set  to  be  homogeneous  {Q  =  250  pN  and  /fe  =  1  pN/24  nm  for  0  <  5  <  50  /im) 
except  that  forced  oscillations  are  applied.  The  period  of  the  oscillations  is  60  ms. 

The  flagellar  shapes  in  the  (x,  y)  coordinate  are  obtained  by:  z(s)  =  co8(<r  - 

0.5)ds,  y(s)  =  fg  sin((T  —  0.5)(/s.  Two  successive  patterns  in  each  panel  are 
shown  at  5-ms  time  intervals.  Parameters  are:  (A)  7  =  50  pNms/24  nm,  C^r  =  0; 
(B)  7  =  50  pNms/24  nm,  C/v  =  0.5  pNms//im^;  and  (C)  7  =  0,  Cat  =  5  pNms/^im^. 


(see  right  panels).  First,  the  plateau  phases  of  the  sliding  pattern  become  spiky  at 
local  regions.  Since  spiky  regions  are  localized,  they  are  caused  by  the  second-order 
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space  derivative  term  in  Eq.  (7).  Second,  the  plateau  phases  are  globally  inclined. 
These  global  changes  result  from  long-range  interactions  which  are  described  by 
the  fourth-order  space  derivative  term  in  Eq.  (7). 

The  system  described  by  Eq.  (7)  is  subjected  lo  intrinsic  instability  when  7  =  0 
and  Cn  =  5  pNms/^m"  (see  Figure  4(C)).  In  the  following  simulations,  solutions 
to  Eqs.  (7)  and  (8)  are  obtained  under  these  conditions  as  they  correspond  to  the 
experimental  conditions.'^  Because  of  the  instability  inherent  in  this  model  system, 
the  dynamical  behavior  must  be  studied  for  a  long  time.  For  this  purpose,  two 
types  of  representations  are  used.  One  is  the  energy  dissipation  which  is  obtauned 
by  integrating  {dajdi)^  with  respect  to  space,  s.  This  simply  indicates  the  intrinsic 
instability.  The  other  is  a  space-time  diagram  of  a  in  which  the  regions  for  o  >  0.5 
are  plotted  by  bars  against  space,  s,  at  5-ms  time  intervals.  This  plot  reflects  the 
spatio-temporal  sliding  pattern. 


5.2  REVERSAL  OF  PROPAGATING  WAVES 

A  50-/im-long  model  flagellum  has  a  homogeneous  structure,  in  which  opposed 
active  elements  are  arranged  along  the  system  from  one  end  to  the  other.  This 
model  system  is  initially  set  to  be  straight  except  for  the  one  end  (i.e.,  the  left 
end).  Such  an  initial  bend'is  developed  and  propagates  toward  the  other  end  (i.e., 
the  right  end). 

Figure  5(A)  shows  the  energy  dissipation.  A  number  of  spiky  patterns  exist 
which  correspond  to  intrinsic  instability.  There  are  two  minima  in  the  time  course 
of  the  energy  dissipation:  one  is  at  <  =  1120  ms  and  the  other  is  at  i  —  2340  ms. 
Figure  5(B)  shows  the  space-time  diagram  of  cr.  Waves  which  propagate  toward 
the  right  are  represented  by  successive  bars  moving  in  the  rightward  direction.  As 
indicated  by  the  first  arrow  a.t  t  =  1120  ms,  the  direction  of  propagating  waves 
is  reversed.  This  reversal  occurs  as  follows.  The  trailing  edge  of  the  original  wave 
first  slows  down,  while  the  leading  edge  does  not  significantly  change  its  propagat¬ 
ing  velocity.  Then,  the  wave  changes  its  form  and  the  deformed  part  sends  out  a 
wave  which  propagates  in  the  direction  opposite  to  the  original  direction  (i.e.,  wave 
splitting^^).  This  new  wave  collides  with  the  subsequent  wave.  Since  the  new  wave 
is  large  enough,  it  can  destroy  the  other.  As  a  result,  there  are  only  waves  which 
propagate  toward  the  left.  The  next  reversal  of  these  propagating  waves  occurs  at 
t  =  2340  ms  as  indicated  by  the  second  arrow. 

If  two  waves  which  propagate  in  the  opposite  directions  are  identical,  they  pass 
through  on  collision.^^  Non-annihilating  propagating  waves  of  this  kind  are  known 
as  solitons.  Non-annihilating  waves  are  also  observed  in  real  flagella.*'^® 
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(A) 


(B) 


FIGURE  6  The  energy  dissipation  (A)  and  space-time  diagram  of  <r  (B).  The  flagellum 
is  set  to  be  inhomogeneous.  Parameters  are:  j  =  0,  =  5  pNms/^m^,  Q  =  250  pN 

and  Ke  =  50  pN/24  nm  for  s  =  1  fjm,  Q  =  250  pN  and  Kt  =  \  pN/24  nm  for 
1  <  s  <  40  nm,  and  Q  =  0  and  A'*  =  1  pN/24  nm  for  40  <  s  <  50  pm.  Simulation 
results  are  shown  up  to  f  =  2000  ms. 
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5.3  INSERTION  OF  PASSIVE  REGION  AT  ONE  END 

The  model  system  examined  in  the  previous  section  demonstrated  the  reversal  of 
propagating  waves  and  soliton-like  behavior.  The  problem  still  remaining  is  how  to 
demonstrate  unidirectional  waves  typical  of  “normal”  flagella.  To  solve  this  problem, 
let  us  consider  the  fine  structure  of  sea  urchin  sperm  flagella  which  show  the  regular 
waves.  These  flagell  are  41-43  fim  long.  Each  flagellum  h«is  an  inert  terminal  piece 
of  5-8  nm  long  at  the  distal  end^®  and  has  a  basal  plate  at  the  basal  end.^°  Based 
on  these  observations,  opposed  active  elements  are  removed  from  the  distal  10  pim 
of  the  50-/im-long  model  flagellum,  and  a  strong  elastic  component  is  placed  at  the 
base.  Mathematically,  this  situation  is  modeled  when  Q  =  0  for  40  <  s  <  50  nm 
and  Ke  =  50pN/24  nm  for  s  =  1  /xm. 

Figure  6(A)  shows  the  energy  dissipation.  The  peaks  of  spiky  patterns  are 
reduced  extensively.  The  passive  terminal  region  works  like  a  bulk  system  which 
can  absorb  the  instability  arising  from  the  active  region.  Figure  6(B)  shows  the 
space-time  diagram  of  cr.  As  a  result  of  the  reduction  of  the  intrinsic  instability, 
only  unidirectional  propagating  waves  are  demonstrated. 


5.4  INCREASE  IN  SYSTEM  SIZE 

The  model  system  is  set  to  be  homogeneous  again,  but  its  length  is  set  to  be 
100  /xm.  A  single  propagating  wave  is  initially  present  in  the  system.  It  propagates 
to  the  right  and  two  waves  are  reflected  at  the  right  end  based  on  the  wave  split¬ 
ting  mechanism  (see  Section  5.2).  The  first  one  propagates  slowly,  while  the  second 
propagates  quickly.  Since  the  system  size  is  doubled,  the  average  value  of  the  energ> 
dissipation  is  almost  doubled  as  indicated  by  Figure  7(A).  Figure  7(B)  shows  the 
space-time  diagram  of  c.  As  indicated  by  the  first  eirrow,  the  second  wave  collides 
with  the  first  one  at  f  =  425  ms.  After  the  collision,  they  continue  to  propagate. 
Collision  of  two  waves  which  propagate  in  the  same  direction  is  experimental!}'  ob¬ 
served.  Following  the  collision,  the  system  shows  unidirectional  propagating  waves 
for  a  while.  However,  as  indicated  by  the  second  arrow,  the  spatio-temporal  sliding 
pattern  begins  to  be  chaotic  at  <  =  1260  ms.  There  are  different  sections  which  show 
quite  different  wave  parameters  such  as  the  wavelengths  and  wave  frequencies.  This 
chaotic  behavior  may  correspond  to  the  wave  patterns  observed  in  a  long  insect 
flagellum.^® 


6.  DISCUSSION 

The  most  important  problem  is  how  to  specify  the  internal  shear  force,  S,  in  such 
a  way  that  Eq.  (7)  gives  rise  to  various  type^  of  wave  phenomena.  In  the  present 
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paper,  the  shear  force,  S,  was  defined  as  a  function  of  a  under  the  condition  of 
7  =  0  in  Eq.  (8a)  as  in  Sections  5. 2-5.4; 

S  =  S((t).  (10) 


(A) 
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FIGURE  7  The  energy  dissipation  (A)  and  space-time  diagram  of  (B).  The  flagellum 
is  set  to  be  homogeneous.  Parameters  are:  7  =  0,  Cat  =  5  pNms//im^,  Q  =  250  pN 
and  Ke  =  1  pN/24  nm  for  0  <  s  <  100  fim.  Simulation  results  are  shown  up  to 
i  =  2000  ms. 
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It  is  very  difficult  to  solve  the  above  problem  because  the  system  described  by 
Eqs.  (7)  and  (  10)  is  subjected  to  the  intrinsic  instability.  To  understand  this  situa¬ 
tion,  let  us  consider  a  simple  case  where  the  internal  shear  force.  5,  is  proportional 
to  the  sliding  displacement,  <r.  Then,  the  second  term  in  Eq.  (7)  corresponds  to 
the  negative  diffusion  leading  to  destabilization,  while  the  third  term  causes  stabi¬ 
lization.  The  competition  between  the  two  properties  leads  to  intrinsic  instability. 
Furthermore,  there  are  only  even  powers  of  the  space  derivatives.  This  means  that 
symmetry  holds  with  respect  to  space,  s;  that  is,  the  equation  is  invariant  under 
the  spatial  inversion  s  — >  —s.  As  a  result,  both  distally  propagating  and  proximally 
propagating  waves  were  equally  developed. 

To  get  unidirectional  waves,  the  structural  asymmetry  such  as  the  terminal 
piece  without  active  elements  was  taken  into  account.  The  passive  region  absorbed 
instability  arising  from  the  active  region.  The  passive  region  in  isolation  does  not 
show  any  function.  But  it  can  work  to  control  orders  when  it  coexists  with  the 
active  region.  By  analogy  with  this  model  behavior,  it  is  important  to  study  any 
network  systems  (e.g.,  gene  network,  immune  network,  and  neural  network)  which 
involve  non-active  elements. 

Besides  the  present  model,  two  other  types  of  models  have  been  proposed 
in  order  to  account  for  normal  base-to-tip  bend  propagation:  curvature-controlled 
models**”^®  and  self- oscillatory  models.®  Curvature-controlled  models  assume  that 
the  shear  force,  5,  is  defined  as  a  function  of  the  curvature,  da/ds: 


S  =  S 


(11) 


To  understand  the  meaning  of  Eq.  (11),  let  us  consider  a  simple  case  that  the  shear 
force,  S,  is  proportional  to  the  curvature,  der/ds.  Then  Eq.  (7)  does  not  hold  the 
symmetry  with  respect  to  space,  s,  because  of  the  presence  of  an  odd  power  of 
the  space  derivative.  As  a  result,  either  distally  or  proximally  propagating  waves 
are  present  depending  on  the  sign  of  the  proportionality  constant.  However,  once 
the  sign  of  the  constant  is  specified,  these  models  cannot  account  for  two  waves 
propagating  in  the  opposite  directions.  F^arthermore,  there  is  no  direct  experimental 
evidence  which  supports  Eq.  (11). 

Self-oscillatory  models  assume  high  internal  viscosity,  7,  to  get  unidirectional 
propagating  waves.  Here,  the  shear  force,  5,  is  conventionally  represented  as  follows; 

S=S-y%.  (12) 


Let  us  consider  the  extreme  case  of  Cn  =  0.  Equation  (7)  can  be  reduced  to  the 
following  reaction-diffusion  equation: 


-  F  — 


+  5. 


(13) 


In  this  case,  it  is  easy  to  get  unidirectional  propagating  waves  if  an  appropriate 
pace-maker  is  placed  at  one  end  of  the  system.  However,  the  internal  viscosity,  7,  is 
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generally  considered  to  be  negligible,  which  is  inconsistent  with  Eq.  (12).  It  is  now 
clear  that  any  model  except  for  the  present  model  is  based  on  ad  hoc  assumptions 
to  account  for  regular  wave  phenomena. 

Intrinsic  instability  has  not  been  discussed  in  the  field  of  cellular  motility.  One 
reason  for  this  is  that  theoreticians  have  focused  on  the  regular  behavior  though 
there  are  experimental  observations  for  irregular  modes  of  wave  phenomena.  An¬ 
other  reason  is  that  it  is  very  difficult  to  grasp  the  deformed  patterns  from  the  flag¬ 
ellar  shape  only  (see  right  panels  of  Figure  4).  For  these  reasons,  the  observed  irreg¬ 
ularity  has  been  ascribed  to  random  noise.  Equations  similar  to  Eq.  (7)  have  been 
discussed  in  different  physical  contexts.  For  example,  the  Kuramoto-Sivashtnsky 
equation^^’^^’^^  and  the  generalized  reaction- diffusion  equation^®  have  this  class  of 
intrinsic  instability.  Numerical  simulations  for  these  equations  show  complex  dy¬ 
namics.  Despite  the  diversity  of  dynamical  systems,  it  is  very  interesting  to  notice 
that  there  may  be  a  common  principle  behind  them.  I  hope  that  the  present  study 
stimulates  the  investigation  of  such  a  principle. 
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Cellular  Automata  with  Non-Uniform  Rules: 

An  Illustration  of  Kauffman’s  Boolean  Network 
Theory _ 


INTRODUCTION 

KaufFman®’^’®’^  introduced  random  Boolean  networks  in  order  to  study  the  phe¬ 
nomenon  that  every  multi-ceilular  organism  has  a  (limited)  number  of  different  cell 
types,  although  the  genetic  material  in  all  cells  is  identical.  Each  node  in  the  net¬ 
work  is  a  binary  automaton,  which  can  be  either  on  or  off  (true  or  false).  Every 
automaton  in  the  network  is  connected  to  K  other  automata.  An  automaton  will 
change  its  state  according  to  a  transition  function,  which  is  baised  upon  the  states 
of  the  connected  automata.  UK  =  2,  there  exist  16)  different  transition 

functions,  each  determining  in  a  different  way  the  effect  of  the  two  connected  au¬ 
tomata.  Out  of  these  2^^  ^  possible  transition  rules,  one  rule  is  randomly  assigned 
to  every  automaton  in  the  network.  For  the  rest  of  its  “lifetime,”  the  automaton 
will  obey  this  transition  rule.  The  assignment  of  the  transition  rules  is  done  by 
filling  in  a  look-up  table  of  all  possible  input  configurations  with  ones  and  zeros; 
every  position  in  this  table  will  have  a  probability  p  to  become  one,  and  (1  —  p) 
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to  become  zero  (usually  p  =  0.5).  The  model  constructed  in  this  way  models  the 
assumed  network  of  interacting  genes  in  a  cell. 

These  Boolean  networks  have  been  zmalyzed  for  different  K  and  p,  and  they 
show  interesting  behavior. They  possess  a  limited  number  of  attractors, 
but  also  exhibit  chaotic  behavior  if  A'  >2  (at  p  =  0.5).  The  set  of  attractors 
in  a  Boolean  network  could  be  linked  to  the  set  of  cell  types,  by  assuming  that 
epistatic  interactions  between  genes  (e.g.,  gene  regulation)  do  not  allow  all  possible 
configurations,  but  rather  force  the  network  of  connected  genes  to  a  strongly  limited 
number  of  patterns  of  genetic  activity. 

Kauffman’s  Boolean  networks  have  been  further  analyzed  by  using  the  cellu¬ 
lar  automaton  formalism,  thus  putting  local,  spatial  constraints  upon  the  interact- 

These  studies  mainly  focussed  upon  the  phase  transition  be¬ 
tween  frozen  and  chaotic  behavior,  damage  spreading,  and  fractal  dimension  in 
relation  to  the  percolation  threshold. In  most  cases  the  parameter  p 
has  been  used  in  order  to  study  these  phenomena.  For  two-dimensional  cellular 
automata  and  K  =  4,  the  transition  between  fixed-point  or  periodic  behavior  and 
chaotic  behavior  is  at  approximately  p  =  0.31.  Above  this  critical  p  chaos  will  arise, 
and  damage  does  not  remain  localized. 

The  aim  of  this  paper  is  to  exploit  the  cellular  automaton  formalism  to  illustrate 
the  main  results  of  Kauffman’s  Boolean  network  theory  and  to  show  the  beauty 
of  the  patterns  that  arise.  We  believe  that  the  work  on  this  subject  is  lacking  a 
visualization  of  the  rich  dynamics  of  these  networks.  So,  we  will  show  the  dynamics 
of  one-  and  two-dimensional  cellular  automata  with  non-uniform  rules. 

The  patterns  of  behavior  of  two-dimensional  cellular  automata  can  be  difficult 
to  grasp,  even  when  displayed  as  a  movie.  However,  by  using  one-dimensional  cross 
sections  apparent  chaotic  two-dimensional  behavior  displays  an  amazing  amount 
of  structure  (compare  Poincare  sections).  We  already  applied  this  technique  suc¬ 
cessfully  in  the  study  of  cellular  automata  in  another  context, and  the  present 
study  is  another  example  of  its  usefulness. 

Furthermore,  we  will  discuss  the  shortcomings  of  p  as  a  parameter  to  charac¬ 
terize  the  system.  For  reasons  of  simplicity  we  will  start  with  one  of  the  simplest 
cellular  automata:  one-dimensional  with  K  =  2. 


FIXED-POINT  AND  PERIODIC  BEHAVIOR 

Starting  with  an  even  mixture  of  all  possible  rules  in  a  cellular  automaton  of  100 
cells,  we  observe  an  amazing  variety  of  localized  fixed-point  and  periodic  behavior, 
which  emerges  after  only  a  few  generations  (Figure  1). 
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FIGURE  1  Behavioral  patterns  in  a  non-uniform  one-dimensional  cellular  automaton 
(K  =  2).  Four  different  time  series  (a-d)  of  a  one-dimensional  cellular  automaton  of  100 
cells  {K  =  2).  Each  replicate  uses  a  different  initial  configuration  as  well  as  a  different 
pattern  of  rules. 


We  examined  a  very  small  part  of  the  “genome”  {N  =  10)  in  order  to  determine 
the  number  of  attractors  and  the  extent  of  their  basins  of  attraction.  Therefore,  the 
outcome  of  all  2^°  (=  1024)  possible  initial  configurations  has  been  studied.  The 
results,  presented  in  Figure  2,  show  that  there  are  only  four  different  types  of  be¬ 
havior.  This  number  is  of  the  order  of  magnitude  of  as  observed  in  Kauffman’s 
Boolean  networks  and  also  sometimes  estimated  to  be  the  number  of  cell  types 
in  multi-cellular  organisms.^  The  four  behavioral  patterns  are  all  periodic,  with 
periods  3,  4,  6,  and  12.  The  basins  of  attraction  are  112,  128,  272,  and  512. 
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FIGURE  2  Attractors  in  a  one-dimensional  CA  of  length  10. 


FORCING  AND  NON-FORCING  FUNCTIONS 

Kauffman  attributed  his  results  to  the  special  properties  of  certain  rules:  certain 
rules  exhibit  a  forcing  (or  canalizing)  effect. This  means  that  a  cell  obeying 
one  of  these  particular  rules,  can  be  forced  to  a  state  by  only  one  of  the  neigh¬ 
bors,  regardless  of  the  state(s)  of  the  other  neighbor(s).  In  Table  1  all  rules  of  the 
one-dimensional  (A'  =  2)  cellular  automaton  are  listed  together  with  their  char¬ 
acterization  as  forcing,  half-forcing,  noTi-forcing,  or  immune.  The  clearest  example 
of  forcing  is  formed  by  the  rules  that  effect  a  copy  of  the  state  of  one  of  the  two 
neighbors  to  the  one  that  obeys  the  rule  (rule  3  and  5).  Another  example  is  the 
logical  AND  function  (rule  1);  if  one  of  the  neighbors  is  zero,  it  does  not  matter 
what  the  state  of  the  other  neighbor  is;  the  outcome  will  be  zero.  However,  if  one 
neighbor  is  one,  then  the  outcome  is  determined  by  the  state  of  the  other  neighbor. 
This  is  the  reason  why  we  call  this  rule  “half-forcing.”  The  logical  function  exclu¬ 
sive  OR  (XOR,  rule  6)  is  an  example  of  the  opposite  of  forcing;  in  all  cases  both 
neighbors  will  determine  the  outcome  together.  Another  rule  with  special  proper¬ 
ties  is  the  rule  which  keeps  a  cell  clamped  to  a  §tate,  regardless  of  the  states  of  the 
neighborhood  (rule  0  and  15);  this  is  what  we  call  an  “immune”  rule. 
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TABLE  1  Rule  with  K  =  2 
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^  L=Left,  R=Right,  I=Immune,  F=Forced,  H=Half-forced,  N=Non-forced 


Closer  examination  of  the  results  of  the  experiment  presented  in  Figure  2  reveals 
that  if  the  input  pattern  at  positions  6,  7,  and  8  is  1-1-1,  the  system  will  always 
end  in  the  cycle  of  period  4,  regardless  of  the  states  of  the  other  positions  (since 
this  happens  in  12.5%  of  the  cases,  the  extent  of  the  basin  of  attraction  is  easy  to 
understand  0.125  x  1024  =  128).  The  rules  5  (F),  1  (H),  and  7  (H)  at  these  positions 
force  the  entire  system  into  this  pattern. 

A  parameter  often  used  in  the  analyses  of  uniform,  simple  one-dimensional  cel¬ 
lular  automata  with  K  =  2  or  3  is  A  (the  proportion  of  non-zeros  in  a 
rule). ^bi2, 13, 19, 20, 21  Pqj,  one-dimensional  cellular  automata  with  five  ^ates  and 
K  =  4,  Langton^^  showed  that  by  varying  A  between  0.0  auid  1.0,  one  goes  from 
fixed  point  to  periodic  to  chaotic  behavior  and  backwards.  However,  several  studies 
have  shown  that  A  alone  is  not  capable  of  characterizing  the  rule  space  sufficiently 
if  either  the  number  of  states  or  K  is  small.^*’^^’^®  Other  parameters  have  been 
suggested,  among  which  is  the  so-csilled  dependency."^  This  dependency  parameter 
is  analogous  to  the  extent  of  forcing  (a  forcing  rule  has  a  low  value  of  dependency, 
whereas  a  non-forcing  rule  has  the  highest  value  of  dependency).  We  also  see  a 
correspondence  between  A  and  the  parameter  p,  as  used  in  the  analyses  of  cellular 
automata  with  non-uniform  rules.  We  therefore  followed  Hartman,®’®  who  studied 
two-dimensional  cellular  automata* with  non-uniform  rules  (combinations  of  AND 
and  XOR),  and  by  making  a  series  of  runs  of  the  one-dimensional  {K  =  2)  cellu¬ 
lar  automaton  with  different  proportions  of  the  non-forcing  rules  (rule  6  and  9). 
The  results  are  presented  in  Figure  3  in  which  transition  from  fixed-point  and  pe¬ 
riodic  behavior  to  chaotic  behavior  can  be  observed.  The  effect  of  forcing  rules  is 
dramatic:  at  a  proportion  of  80-  or  85-percent  non-forcing  rules,  highly  structured 
regions  appear  in  the  time  plots  if  the  local  density  of  forcing  rules  is  high,  whereas 
the  intermediate  “non-forcing  regions”  still  show  localized  chaotic  behavior.  The 
emergent  chaotic  behavior  in  these  simulations  seems  to  contradict  the  results  of 
Stauffer,  who  concluded  that  chaotic  behavior  does  not  occur  in  one-dimensional 
cellular  automata  with  K  =  2.  However,  in  his  simulations  the  proportion  of  non- 
forcing  rules  remained  fixed  at  12.5%. 
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FIGURE  3  The  effect  of  non-forcing  rules  in  a  one-dimensional  CA  (K  =  2).  Time 
series  (100  generations)  of  a  one-dimensional  cellular  automaton  of  200  cells  (K  =  2) 
with  a  different  percentage  of  non-forcing  rules  (rule  6  and  9).  On  top  of  every  time 
plot,  the  rule  type  of  every  cell  is  indicated  by  a  white  bar  if  it  obeys  a  non-forcing  mle, 
or  a  black  bar  if  it  obeys  a  forcing,  half-forcing,  or  immune  rule.  Each  replicate  uses  a 
different  initial  configuration  as  well  as  a  different  pattern  of  rules. 
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FIGURE  4  The  effect  of  non-forcing  rules  in  a  two-dimensional  CA  (K  =  5)  Results  of 
a  two-dimensional  cellular  automaton 'Of  100  x  100  cells  (on  a  torus),  with  A'  =  5  and 
different  percentages  of  the  non-forcing  XOR.  (a)  The  states  at  t  =  100.  (b)  Time  series 
(100  generations  from  t  =  100  to  200)  of  row  50.  (c)  Positions  of  fixed  points,  showing 
the  "frozen"  regions,  (d)  Positions  of  cycles  with  period  2.  The  rules  which  have  been 
used  in  these  simulations  are  (with  N,  S,  E.  and  W,  representing  the  neighboring  cells 
and  c  the  center  cell):  0%  non-forcing —  C  or  E  or  W  or  N  or  S  (OR-rule),  C  and  E 
and  W  and  N  and  S  (AND-rule),  C  or  E  or  W  xor  N  and  S,  C  or  E  or  W  or  N  XOR  S; 
25%  non-forcing —  C  or  E  or  W  XOR  N  and  S,  OR-rule,  AND-mle,  XOR-rule;  50%  non¬ 
forcing —  OR-rule,  AND-rule,  XOR-rule;  1 00%  non-forcing —  C  xor  E  xor  W  xor  N  xor  S 
(XOR-rule).  It  will  take  some  time  before  the  system  will  attain  its  attractor,  (continued) 
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FIGURE  4  (cont’d.)  so  (c)  and  (d)  are  obtained  by  recording  the  fixed  points  and 
cycles  of  period  2  from  t  =  100  to  t  =  200  generations.  Only  in  the  system  with  2% 
non-forcing  rules  did  we  obsen/e  a  small  number  of  cells  (<  10)  that  didn’t  settle  yet  in 
their  local  periodic  or  fixed-point  behavior  after  200  generations. 


TWO-DIMENSIONAL  CELLULAR  AUTOMATA 

The  above  results  extended  to  two-dimensional  cellular  automata  are  shown  in 
Figure  4.  For  practical  reasons,  we  do  not  assign  every  cell  a  rule  from  the  whole 
set  of  possible  rules  (which  is  2^^  ^  =  5.9  *  10*^),  but  we  draw  our  rules  from  a  small 
subset,  which  (of  course)  includes  the  non-forcing  XOR  function. 

Figure  4  (a)  shows  the  states  aX  t  =  100.  These  static  plots  do  not  show  any 
differences.  Figures  4  (c)  and  (d)  show  the  positions  of  respectively  the  fixed  points 
(c)  and  the  fixed  points  or  cycles  of  period  2  (d),  in  order  to  show  the  extent  of  the 
“frozen”  regions.  However ,._^e  tie  series  of  the  one-dimensional  sections  (b)  provide 
a  much  better  insight  in  the  qualitative  behavior  of  the  two-dimensional  system.  )t 
is  striking  that  also  in  these  two-dimensional  systems,  the  patterns  are  extremely 
localized.  Again,  increasing  the  proportion  of  non-forcing  rules  yields  a  transition 
from  fixed-point  and  periodic  behavior  to  chaotic  behavior. 


CONCLUSION 

The  analyses  of  the  rule  space  of  uniform  cellular  automata  showed  that  one  param¬ 
eter  (usually  A)  is  not  sufficient  to  characterize  the  rich  behavior  of  these  systems. 
Therefore,  we  advocate  the  inclusion  of  more  parameters  in  the  analysis  on  non- 
uniform  cellular  automata.  The  proportion  of  non-forcing  rules  is  an  important 
parameter  in  these  systems. 
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Random  Boolean  Networks:  Comparison 
Between  Randomly  Connected  and  Lattice- 
Connected  Networks 


Random  Boolean  networks  exhibit  self-organizing  properties  and  can  be 
used  as  a  model  of  the  biological  cell.  Below  I  show  that  network  geometry— 
the  way  in  which  nodes  are  connected — produces  networks  with  different 
behaviors.  Network  behavior  is  affected  by  the  method  used  to  calculate  the 
subsequent  state  of  the  network,  absence  of  certain  Boolean  functions  and 
the  size  of  the  network.  The  behavior  of  a  variety  of  networks  is  revealed 
through  computer  simulation. 

1.  INTRODUCTION  ^ 

Rcindom  Boolean  networks  have  been  shown  to  exhibit  self-organizing  proper¬ 
ties;  particularly  striking  results  are  achieved  in  the  nets  with  connectivity  of  two 
(K  =  2).h2.3.4,5  ^  network  with  N  nodes  (each  can  take  the  value  0  or  1)  has  2^ 
potential  states,  but  on  the  average  only  a  small  proportion  of  those,  approximately 
N  states,  are  stable  (i.e.,  belonging  to  the  cycles).  The  number  of  cycles  reached  by 
the  network  and  cycle  length  are  extremely  short,  approximately  \/N^  Kauffman 
showed  that  the  above  networks  can  be  used  as  a  model  of  a  biological  cell  or  some 
aspects  of  it."*'®  In  this  model  individual  genes  are  represented  by  the  nodes  of  the 
network,  interactions  between  genes  are  modeled  by  the  connections  between  the 
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nodes.  A  network  cycle  is  then  viewed  as  a  cell  type]  the  length  of  the  cycle  is  associ¬ 
ated  with  time  between  cell  divisions.  The  majority  of  the  data  about  the  networks 
is  derived  from  computer  simulations, although  theoretical  predictions  have  been 
made.^'^  The  results  of  computer  simulation  presented  in  this  paper  argue  that  raui- 
domly  connected  and  lattice-connected  networks  exhibit  different  behavior  under 
identical  set  of  parameters.  The  effect  of  several  basic  parameters  was  studied:  the 
method  used  for  updating  the  nodes  during  calculation  of  the  network’s  next  state, 
exclusion  of  Contradiction  and  Tautology  Boolean  functions,  and  increasing  the  size 
of  the  network.  Lattice-connected  networks  are  very  sensitive  to  changes  in  these 
parameters:  the  number  of  cycles  and  the  length  of  the  run-in  increases  dramati¬ 
cally  causing  loss  of  some  self-organizing  properties  of  the  network.  Those  types  of 
networks  cannot,  therefore,  be  a  successful  model  of  a  biologiccJ  cell.  Furthermore, 
some  combination  of  parameters — synchronous  method  of  node  update  and  use  of 
only  14  Boolean  functions — causes  randomly  connected  networks  to  go  through  aji 
extensive  number  of  steps  (<  5N,  where  N  is  number  of  nodes  in  the  network)  be¬ 
fore  a  cycle  is  found.  Only  selected  types  of  networks  exhibit  biologically  plausible 
behavior. 


2.  DEFINITIONS 

2.1  BASIC  ELEMENTS  AND  BEHAVIOR  OF  THE  NETWORK 

A  network  is  constructed  using  basic  elements  called  nodes.  Each  node  is  a  binary 
device,  taking  values  of  0  or  1.  A  node  calculates  its  state  based  on  inputs  from 
other  nodes  and  its  internal  logical  function  called  a  Boolean  function.  The  state  of 
the  node  serves  as  an  input  to  other  nodes.  The  state  of  the  network  can  be  defined 
as  a  joint  state  of  all  its  nodes  at  any  given  time.  Starting  in  any  of  the  possible 
2^  initial  configurations,  the  network  passes  through  some  sequence  of  states  until 
it  comes  to  one  of  the  previously  encountered  states,  closing  the  cycle.  From  that 
point  on,  the  network  traverses  the  same  subset  of  states,  since  the  transitions 
between  network  states  «ire  fully  deterministic.  The  number  of  states  comprising  a 
given  cycle  will  determine  its  length.  Depending  on  the  initial  state  of  the  network, 
different  cycles  can  be  reached.  The  sequence  of  states  that  the  network  traverses 
before  it  reaches  a  cycle  is  called  the  run-in.  Networks  can  leave  a  cycle  and  move 
into  another  cycle  upon  introduction  of  noise  (noise  can  be  viewed  as  a  temporary 
switch  in  the  state  of  one  or  more  nodes). 
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2.2  CONNECTION  BETWEEN  NODES 

Each  node  in  the  network  is  connected  to  other  nodes.  Connectivity  of  two  (/\  =  2) 
means  that  each  node  has  two  nodes  from  which  it  receives  inputs  and  two  nodes 
(possibly  different  ones)  to  which  it  sends  its  output.  Two  possible  ways  of  making 
connections  between  nodes  have  been  explored:  randomly  connected"*  ®  and  lattice- 
connected  networks.  In  randomly  connected  networks  the  connections  between 
nodes  are  assigned  randomly.  In  lattice-connected  networks  the  immediate  four 
neighbors  of  the  nodes  are  used  for  connection — two  serve  as  input  nodes  and  two 
cis  output  nodes.  The  latter  method  of  connection  is  useful  for  easy  visualization  of 
closely  interacting  nodes. 


2.3  TECHNIQUE  OF  NODE  UPDATE 

Each  node  of  the  network  calculates  and  updates  its  state  at  every  iteration  of  the 
network.  The  network  develops  in  discrete  time  steps:  outputs  of  the  nodes  at  time 
(t  —  1)  will  serve  as  the  inputs  into  the  nodes  at  time  {t  —  1)  and  will,  therefore, 
determine  the  outputs  at  time  t.  There  are  two  possible  temporal  ways  in  which 
the  node  update  can  be  done: 

■  All  nodes  in  the  net^vork  update  their  state  simultaneously — at  time  (<  —  1) 
all  nodes  look  at  their  inputs  and  calculate  the  outputs  for  time  t.  This  can 
be  seen  as  a  synchronous  or  parallel  process  (this  is  a  type  of  update  that  is 
claimed  to  be  used  in  the  majority  of  the  simulations.^  "*) 

■  Nodes  update  their  state  asynchronously — that  is,  there  is  some  order  in  which 
the  update  is  performed  until  a/f  nodes  calculate  their  state,  at  which  point 
the  network  reaches  its  subsequent  state.  This  is  a  sequential  process;  it  can 
be  viewed  as  multistep  (precisely,  N  —  step,  TV  is  a  number  of  nodes  in  the 
network)  process.  Therefore  the  time  unit  t  consists  of  TV  intermediate  steps: 

the  last  step  coinciding  with  time  t  of  the  simultaneous  update: 
t  =  ts-  At  each  step  the  network’s  state  is  affected  by  a  change  in  the  state  of 
only  one  node  (the  node  that  is  being  updated  at  that  time  unit).  The  (TV  —  1) 
steps  of  this  process  are  transient  and  only  a  final  step  is  a  “real”  state  of  the 
network. 

Both  types  of  update  are  considered  here  to  be  a  one-step  iteration  from  the 
point  of  view  of  the  network. 


2.4  BOOLEAN  FUNCTIONS 

There  are  16  transformations  with  two  inputs,  called  Boolean  functions.  Most  of 
the  16  Boolean  functions  with  two  inputs  are  forcing  functions.  The  term  “forcing” 
means  that  the  function’s  output  can  be  determined  by  one  input  value,  independent 
of  the  value  of  the  second  input.  The  forcing  input  is  the  value  of  the  input  that 
determines  the  output,  the  forced  output  (or  value)  is  the  outcome  of  the  function 


444 


Stella  Veretnik 


under  the  forcing  input.  Two  out  of  16  Boolean  functions  are  not  forcing:  these 
are  Equal  and  Exclusive  OR.  Two  other  functions  (Contradiction  and  Tautology) 
output  the  same  value  under  any  input.  For  these  and  the  rest  of  the  functions,  one 
or  both  input  values  can  be  forcing. 


2.5  FORCING  STRUCTURES 

When  a  forced  output  from  one  node  in  the  network  turns  out  to  be  a  forcing  input 
into  another  node — the  basic  unit  of  a  forcing  structure  appears.  Forcing  structures 
vary  in  length;  a  forcing  input  into  the  forcing  structure  is  guaranteed  to  propagate 
along  it,  forcing  all  the  nodes  along  the  path  into  their  forced  values.  Closing  a 
forcing  structure  on  itself  creates  a  forcing  loop,  it  is  a  more  powerful  structure  than 
a  linear  forcing  structure  and  its  propagating  signal  eventually  reinforces  itself.® 
Forcing  structures  are  abundant  in  the  random  Boolean  networks  with  two  inputs 
because  majority  of  the  Boolean  functions  (14  out  of  16)  are  forcing  functions. 
Forcing  structures  (loops,  in  particular)  will  “freeze”  parts  of  the  network  in  a 
particular  mode,  artificially  reducing  the  number  of  potential  states  of  the  network 
and,  therefore,  contributing  to  the  self-organizing  behavior. 


2.6  PREVIOUSLY  PREDICTED  BEHAVIOR  OF  THE  NETWORK 

Kauffman’s  simulations'*®  predicted  that  for  randomly  connected  networks  with 
two  inputs,  the  average  cycle  length  is  \/N  and  average  number  of  cycles  \/N, 
where  N  is  size  of  the  network.  When  perturbed,  the  network  is  expected  to  return 
to  its  original  cycle  in  approximately  90%  of  the  cases. 


3.  SIMULATIONS 

A  set  of  pairameters  is  chosen  for  the  network  simulation  and  run  repeatedly  under 
multiple  conditions.  The  results  of  the  simulations  are  averaged  and  au-e  interpreted 
cis  a  “tendency  in  the  behavior”  for  a  particular  type  of  network. 


3.1  INPUT  PARAMETERS  UNDER  INVESTIGATION 

All  networks  have  connectivity  of  two  {k  =  2),  a.  network  simulation  is  selected 
according  to: 

■  type  of  connection  between  nodes  (random  or  lattice) 

■  technique  of  node  update  (synchronous,  asynchronous) 

■  subset  of  the  Boolean  functions  used  (14  or  16) 
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Every  combination  of  the  above  parameters  specifies  a  type  of  network;  thus,  a 
total  of  eight  different  types  of  networks  are  simulated. 

A  network  is  determined  by  assigning  node  connections  and  Boolean  functions. 
Many  different  networks  of  each  type  are  simulated.  Each  network  is  simulated 
under  multiple  initial  node  states,  and  results  are  averaged  to  determine  behavior 
of  that  network. 

Results  from  different  networks  are  later  averaged  as  well  in  order  to  come  up 
with  average  behavior  the  network  type  under  a  specific  set  of  parameters  (such  as 
the  method  of  connections,  technique  of  node  update,  or  exclusion  of  some  Boolean 
functions).  This  second  average  represents  behavior  of  the  network  independent  of 
the  specific  Boolean  function  or  connections  between  nodes  (in  the  case  of  randomly 
connected  nets). 


3.2  OUTPUT  PARAMETERS  OF  INTEREST 

Every  type  of  network  can  be  characterized  by  several  averaged  parameters,  in 
particular: 

■  number  of  different  cycles  to  which  networks  arrive 

■  length  of  the  cycles  (weighted  (which  includes  all  cycles)  and  not-weighted  (only 
unique  cycles  are  considered)) 

■  length  of  the  run-in  (how  long  it  takes  before  the  network  arrives  at  a  cycle) 


3.3  SIZE  OF  THE  NETWORKS 

Simulations  are  done  on  networks  of  two  sizes. 

1.  100-node  Networks  (10x10):  100  different  networks  of  each  type  are  search¬ 
ed;  each  is  run  under  400  initial  conditions.  Simulations  are  done  on  a  Mac  II 
and  a  Sun4. 

2.  900-node  Networks  (30x30):  small  numbers  of  networks  are  used;  each 
run  under  600  initial  conditions.  Simulations  are  done  on  a  Cray  XM-P.  Only 
selected  types  of  networks  were'simulated  on  the  Cray. 


4.  RESULTS 

The  effects  of  three  different  parameters  on  the  behavior  of  the  random  Boolean 
networks  with  two  possible  geometries  of  node  connection  (randomly  connected  and 
lattice  connected)  have  been  studied: 

1.  Exclusion  of  Contradiction  and  Tautology  functions  out  of  the  set  of  16  possible 
Boolean  functions. 
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2.  Asynchronous  vs.  synchronous  update  of  nodes  in  the  network. 

3.  Increasing  the  number  of  nodes  in  the  network. 

Lattice- connected  networks  have  a  different  response  to  the  change  in  these  pa¬ 
rameters;  in  particular  lattice-connected  networks  appear  to  be  more  “sensitive"’  to 
the  changes  than  randomly  connected  networks.  In  this  section  I  compare  the  be¬ 
havior  of  lattice-connected  and  randomly  connected  networks.  Note  that  the  effect 
of  the  first  two  parameters  is  studied  on  the  networks  of  100  nodes. 
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4.1  EFFECT  OF  THE  NODE  UPDATE  TECHNIQUE 

TYPES  OF  THE  NODE  UPDATE  There  are  two  different  techniques  of  node  update 
that  are  used  in  the  simulations: 

1.  Synchronous  update:  all  nodes  of  the  network  are  updated  simultaneously. 

2.  Asynchronous  update:  nodes  are  updated  in  some  predetermined  sequence. 
Two  different  types  of  sequential  update  are  tried:  (1)  lattice- ordered,  that  is, 
nodes  are  updated  in  their  numerical  order  (left  to  right,  top  to  bottom),  and 
(2)  random,  that  is,  nodes  are  updated  in  random  order  (which  is  established 
once  for  each  network). 

Results  produced  under  sequential  and  random  update  are  essentially  identical 
as  expected;  only  the  random  type  of  asynchronous  update  is  reported  here. 

RANDOMLY  CONNECTED  NETWORKS  have,  essentially,  the  same  behavior  under 
the  synchronous  and  asynchronous  update.  All  three  studied  parameters — number 
of  cycles,  cycle  length,  and  length  of  run-in — tend  to  be  2-3  times  longer  under  the 
synchronous  update.  See  Figure  1  (compare  histograms  A  (asynchronous  update) 
with  histograms  B  (synchronous  update));  also,  compare  lines  1  and  3  in  Table  1. 

LATTICE-CONNECTED  NETWORKS  show  an  increase  in  the  number  of  cycles  that 
the  network  reaches,  while  the  length  of  the  cycle  and  run-in  remain  the  same;  see 
Figure  2  (compare  histograms  (a)  (asynchronous  update)  with  histograms  (b)  (syn¬ 
chronous  update)).  The  increase  in  the  number  of  cycles  under  synchronous  update 
is  rather  dramatic — an  average  of  213  cycles  is  found  in  400  runs  (under  different 
initial  conditions).  When  the  number  of  different  initial  conditions  is  increased  to 
1000  the  number  of  found  cycles  is  increased  to  428  (see  Table  1),  indicating  that 
the  number  of  potential  cycles  of  the  network  is  not  exhausted  yet.  Thus,  the  num¬ 
ber  of  cycles  in  the  lattice-connected  networks  under  synchronous  update  is  >  4N 
(A  is  a  number  of  nodes)  for  100-node  networks. 

4.2  EFFECT  OF  EXCLUSION  OF  TWO  BOOLEAN  FUNCTIONS:  TAUTOLOGY 
AND  CONTRADICTION 

Two  sets  of  Boolean  functions  are  studied: 

■  The  set  of  16  functions  consists  of  all  possible  Boolean  functions  occurring  with 
equal  probability,  and 

■  The  set  of  14  functions  lacks  Contradiction  and  Tautology;  the  rest  of  the 
functions  are  equally  distributed. 

Tautology  and  Contradiction  are  the  most  powerful  forcing  functions:  their  out¬ 
put  is  independent  of  the  input.  One  would  expect  networks  without  those  functions 
to  possess  weaker  forcing  structures. 
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BOTH  RANDOMLY  AND  LATTICE-CONNECTED  NETWORKS  show  a  tendency  toward 
longer  run-ins  and  longer  cycle  length  in  networks  with  14  Boolean  functions:  see 
Table  1.  The  increase  in  the  length  of  the  cycle  and  run-in  is  approximately  twice 
for  lattice-connected  nets  and  3-4  times  for  randomly  connected  networks  with  100 
nodes  under  asynchronous  update.  The  effect  is  very  different  under  the  synchronous 
update  (see  next  section). 

4.3  JOINT  EFFECT  OF  THE  SYNCHRONOUS  UPDATE  AND  EXCLUSION  OF 
CONTRADICTION  AND  TAUTOLOGY 


FOR  THE  RANDOMLY  CONNECTED  NETWORKS:  The  exclusion  of  Contradiction 
and  Tautology  functions  can  be  seen  as  a  destabilizing  effect  on  the  network — it 
takes  longer  for  the  network  to  find  a  cycle.  Synchronous  update  has  a  similar 
effect — it  can  increase  the  number  of  potential  cycles  or  the  length  of  the  run- 
in  and  of  the  cycle.  Individually  those  effects  are  mild — increasing  values  only  by 
factor  of  two.  However,  joining  the  two  effects  produces  networks  with  interesting 
properties — their  run-in  length  increases  dramatically:  from  a  9-step  average  run- 
in,  it  increases  to  more  than  450  steps  (in  50%  of  the  simulated  networks;  see 
Table  1,  compare  line  1  and  4).  It  is  interesting  to  note  that  this  effect  is  specific 
to  the  length  of  the  run-in  while  cycle  length  and  number  of  cycles  are  affected 
mildly;  see  Figure  3.  Furthermore,  run-in  length  does  not  appear  to  be  distributed 
evenly — networks  can  be  divided  into  two  classes:  those  with  relatively  short  run- 
ins  (average  is  56)  and  those  with  very  long  run-ins  (exceed  450).  Any  network  of 
this  type  (synchronous  update,  14  Boolean  functions,  randomly  connected)  has  an 
equal  probability  to  fall  into  one  of  the  classes. 

FOR  THE  UTTICE  CONNECTED  NETWORKS:  The  joint  effects  of  the  synchronous 
update  and  14  Boolean  functions  increase  an  aJready  very  large  number  of  the  po¬ 
tential  cycles  to  approximately  6fY  {N  is  number  of  nodes)  and  probably  higher, 
since  the  ceiling  of  the  number  of  cycles  had  not  been  reached  during  these  simu¬ 
lations.  Interestingly,  this  size  of  lattjce  connected  networks  (100  nodes)  does  not 
show  a  significant  change  in  cycle  length  or  length  of  the  run-in — which  is  charac¬ 
teristic  for  randomly  connected  networks. 
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FIGURE  3  Effect  of  the  method  of  node  update  and  an  exclusion  of  two  Boolean 
functions  on  the  randomly  connected  networks. 


TABLE  2  Behavior  of  networks  with  900  nodes. 
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4.4  INCREASING  NUMBER  OF  NODES  IN  THE  NETWORKS 

All  of  the  above  results  are  from  the  networks  with  100  nodes.  900-node  networks 
were  simulated  on  a  Cray  X-MP  supercomputer.  Three  types  of  the  networks  were 
simulated;  only  a  limited  subset  of  Boolean  assignments  were  tested,  but  some  basic 
properties  of  the  networks  of  larger  sizes  might  be  discerned;  see  Table  2. 

FOR  THE  RANDOMLY  CONNECTED  SYNCHRONOUSLY /KHD  ASYNCHRONOUS¬ 
LY  UPDATED  NETWORKS  WITH  16  FUNCTIONS:  The  number  of  cycles  and  the  cycle 
length  increase  proportionally  with  the  size  of  the  network.  All  networks  found  a 
cycle  within  first  600  steps  of  the  network.  40%  found  a  cycle  after  350  steps  of  the 
network. 

FOR  LA  TTICE-COmSCTED  ASYNCHRONOUSLY  UPDATED  NETWORKS:  Fifty  per¬ 
cent  of  the  networks  could  not  reach  the  cycle  after  350  or  600  steps.  Those  networks 
that  do  reach  a  cycle  have  long  cycles  and  almost  every  cycle  found  is  identified  as 
unique.  Therefore,  it  is  yet  undetermined  how  mzuiy  potential  cycles  there  are  in 
networks  with  900  nodes.  This  type  of  the  network  has  a  small  number  of  cycles  and 
the  shortest  average  run-in  from  among  all  types  of  networks  with  100  nodes,  but 
its  behavior  changes  radically  with  an  increase  in  the  size  of  the  network;  compare 
line  5  in  Table  1  and  lines  four  and  five  in  Table  2. 


4.5  DISTRIBUTION  OF  CYCLE  LENGTH 

One  of  the  interesting  questions  is  whether  there  is  a  tendency  toward  a  specific 
cycle  length  (longer  or  shorter  than  average)  in  the  frequently  occurring  cycles. 
For  thai  purpose  two  methods  were  used  for  measuring  cycle  length:  weighted  and 
not-weighted  cycle  length.  For  the  weighted  cycle  length,  each  cycle  contributes  to 
the  average  cycle  length  proportionally  to  the  frequency  of  its  occurrence.  In  the 
not-weighted  average,  every  lype  of  cycle  is  considered  only  once. 

It  is  interesting  to  notice  that  there  is  a  general  tendency  toward  higher  values 
for  the  weighted  cycle  length,  indicating  a  correlation  between  longer  cycle  length 
and  frequently  occurring  cycles.  TKe  increase  is  more  noticeable  in  the  randomly 
connected  network  with  14  Boolean  funclicns— the  weighted  average  cycle  length 
is  40%  longer. 


5.  DISCUSSION 

Different  types  of  networks  can  be  ranked  according  to  the  degree  to  which  they 
exhibit  characteristics  appropriate  for  a  bidlogicaJ  model.  Both  lattice-connected 
and  randomly  connected  networks  perform  well  under  asynchronous  update  with 
16  Boolean  functions  and  the  small  size  of  the  network  (100  nodes  here).  Their 
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actual  number  of  cycles  and  cycle  length  are  close  to  the  predicted  values  of  y/N 
{N  is  the  number  of  nodes).  Any  deviation  from  this  set  of  parameters  introduces 
a  destabilizing  effect  into  the  behavior  of  the  networks. 

FOR  RANDOMLY  CONNECTED  NETWORKS  The  effect  of  synchronous  node  update 
or  exclusion  of  the  Contradiction  and  Tautology  Boolean  functions  is  small,  but 
the  combination  of  the  two  effects  causes  a  dramatic  increase  in  the  length  of  the 
run-in.  There  is  no  clear  biological  interpretation  of  the  length  of  run-in:  it  could  be 
correlated  with  the  length  of  transition  of  the  network  between  different  cycles  in 
the  presence  of  noise.  During  the  run-in  the  network  is  not  committed  to  a  specific 
cycle.  In  biological  terms  it  means  that  the  transition  between  different  cell  types 
takes  a  very  long  time  in  comparison  with  the  cell’s  lifetime,  which  is  a  poor  model 
of  a  real  cell.  It  is,  therefore,  important  to  test  how  long  it  takes  for  this  type  of 
network  to  move  between  the  cycles  (if  noise  is  introduced). 

Also,  only  the  most  "well  behaved”  type  of  the  network  (asynchronous  update. 
16  Boolean  functions)  was  tested  on  the  larger  size  networks;  the  behavior  of  other 
types  of  the  randomly  connected  networks  under  the  increased  size  of  the  network 
is  unknown. 

FOR  LATTICE-CONNECTED-NETWORKS  Synchronous  update  has  a  quite  different 
effect:  in  the  case  of  randomly  connected  networks,  it  increased  the  length  of  cycles 
and  run-ins;  in  the  lattice-connected  networks,  it  increases  number  of  potential  cy¬ 
cles  dramatically.  Exclusion  of  the  Contradiction  and  Tautology  functions  augments 
this  effect;  the  network  loses  some  of  its  self-organizing  properties. 

When  network  size  was  incretised  from  100  to  900  nodes,  even  the  “best  be¬ 
haved”  lattice  network  (asynchronous  update  with  16  Boolean  functions)  loses  its 
self-organizing  b  'havior:  its  run-ins  increased  more  than  50  times  (in  50%  of  the 
cases)  and  the  number  of  cycles  is  undetermined  yet,  but  is  at  least  on  the  order  of 
N  {N  is  size  of  network)  and,  probably,  much  higher  (see  below). 

The  above  results  indicate  that  lattice-connected  networks  under  a  very  lim¬ 
ited  subset  of  possible  conditions  are  able  to  model  biological  behavior.  Loss  of 
the  self-organizing  properties  with  increasing  size  appear  to  be  the  most  crucial 
disadvantage — real  biological  cells  have  several  thousand  genes  and  therefore  should 
be  modeled  with  at  least  that  many  nodes  in  the  network.  Lattice-connected  net¬ 
works  are  clearly  incapable  of  doing  so. 

THE  TEMPORAL  WAY  IN  WHICH  NODES  OF  THE  NETWORK  CALCULATE  their  states 
(local  rule)  has  a  strong  impact  on  the  global  behavior  of  the  lattice-connected 
network  (it  affects  number  of  the  cycles  of  network).  Asynchronous  update  results 
in  a  smaller  number  of  potential  cycles  of  the  network  when  compared  to  networks 
under  synchronous  update.  Why  is  this  so?  Asynchronous  update  allows  much  faster 
propagation  of  the  signal  through  the  network;  the  signal  can  traverse  part  of  the 
forcing  loop  or  forcing  structure  within  one  update  iteration.  Forcing  structures 
which  are  responsible  for  the  self-organizing  effect  of  the  network  are  formed  sooner. 
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Once  part  of  the  structure  is  formed,  it  becomes  insensitive  toward  all  non-forcing 
inputs.  Somehow,  rapid  formation  of  the  forcing  structures  limits  the  number  of 
potential  cycles  the  network  can  reach.  I  currently  do  not  have  an  explanation  for  the 
phenomena  nor  for  the  fact  that  it  affects  lattice-connected  networks  much  stronger 
than  randomly  connected  networks.  My  guess  is  that  certciin  combinations  of  the 
oscillating  groups  (this  term  is  borrowed  from  Allan  et  al.^  *)  can  only  be  formed 
in  the  absence  of  the  nearby  forcing  structure,  which  in  the  case  of  asynchronous 
update  appears  just  too  fast.  Every  cell  cycle  is  a  combination  of  the  local  oscillating 
groups^'®;  thus,  fewer  oscillation  groups  will  mean  fewer  cycles.  This  point  should 
be  investigated  further. 

Although  asynchronous  update  does  not  appear  to  be  a  pure  one-iteration 
update,  it  is  a  more  realistic  model  from  a  biological  point  of  view — different  com¬ 
ponents  within  cell  (enzymes,  m-RNAs)  have  different  thresholds  for  synthesis, 
different  stability,  etc.  A  more  detailed  model  is  presented  by  Thomas.*^  It  is  im¬ 
portant  to  mention  that  the  technique  of  node  update  has  a  rather  modest  effect 
on  the  length  of  the  cycles  and  the  run-in  length. 

The  dramatic  change  in  the  behavior  of  lattice-connected  networks  with  an 
increase  in  size  could  be  explained  by  the  unique  geometry  of  connections  between 
the  nodes.  Lattice  connection  between  nodes  forces  a  formation  of  the  very  small 
local  forcing  loops,  whicbj_in  turn,  contribute  to  the  formation  of  oscillating  groups. 
Oscillating  groups  contribute  directly  to  the  number  of  the  potential  cycles:  the 
number  of  potential  cycles  is  proportional  to  the  number  of  combinations  of  the 
oscillating  groups.  An  increase  in  the  network  size  increases  the  number  of  the  local 
loops  linearly  and,  therefore,  causes  an  exponential  increase  in  combinations  of  the 
oscillating  groups  and  number  of  potential  cycles.  In  randomly  connected  networks 
random  connections  between  the  nodes  ensure  /on^/er  forcing  loops  and  a  relatively 
small  increase  in  the  number  of  loops,  preventing  rapid  increase  in  potential  cycles. 

Length  of  a  cycle  appears  to  be  the  most  stable  property  among  three  mea¬ 
sured  parameters  (cycle  length,  run-in  length,  and  number  of  cycles).  It  showed  a 
significant  increase  only  in  the  case  of  large  lattice-connected  networks. 

Even  within  a  particular  type  of  network,  the  statistical  behavior  is  not  uni- 
formal  and  may  depend  on  the  Boolean  assignments  and  node  connections  within 
individual  network. 


SUMMARY 

Behavior  of  lattice-connected  and  randomly  connected  networks  is  different,  and  it 
converges  under  a  very  small  subset  of  parameters.  Randomly  connected  networks 
show  strong  self-organizing  behavior  under  most  of  the  studied  conditions.  Lattice- 
connected  networks,  on  the  contrary,  exhibit  very  large  number  of  potential  cycles 
under  most  of  the  parameters  studied  and,  therefore,  are  a  poor  model  of  the 
biological  cell. 
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The  Production  of  Solitons  By  Optimal  Driving 
Forces 


In  general,  nonlinear  waves  are  not  stable  in  a  chain  of  finite  length.  Since  they 
have  a  finite  lifetime,  it  is  important  to  investigate  the  production  of  nonlinear 
waves,  e.g.,  the  production  of  solitons.  A  general  feature  of  nonlinear  waves  is  the 
amplitude  frequency  coupling,  which  causes  the  excitation  by  sinusoidal  driving 
forces  to  be  very  inefficient.  The  response  is  usually  very  complex  in  addition.  We 
present  a  method^®-^^  to  calculate  special  aperiodic  driving  forces,  which  generates 
nonlinear  waves  very  efficiently.  The  response  to  these  driving  forces  is  very  simple. 


INTRODUCTION 

When  a  nonlinear  oscillator  is  perturbed  by  a  sinusoidal  force,  the  response  is 
comparatively  small  in  amplitude,^  and  does  not  fulfill  any  well-defined  resonance 
condition,^  even  when  the  frequency  of  the  driving  force  coincides  with  a  peak  (res¬ 
onance)  in  the  power  spectrum  of  the  unperturbed  system.*^  Outside  the  region  of 
entrainment,  the  response  is  complicated,  in  many  cases  chaotic.'®  *'  In  order  to 
obtain  a  Izu'ge,  simple,  predictable  response,  the  frequency  of  the  driving  force  has 
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to  be  varied  in  such  a  way,  that  it  coincides  at  all  amplitudes  with  the  character¬ 
istic  frequency  of  the  oscillator/  Since  the  characteristic  frequencies  of  nonlinear 
oscillators  usually  depends  on  the  amplitude,  the  optimal  driving  force  has  to  be 
aperiodic.  Recently  a  method  to  calculate  those  optimal  driving  forces  has  been 
presented.'*  We  apply  this  method  in  order  to  calculate  optimal  driving  forces  for 
the  creation  of  solitons. 


CREATION  OF  SOLITONS  BY  APERIODIC  DRIVING  FORCES 

Nonlinear  waves  and  solitons  provide  good  mathematical  models  in  various  fields 
of  science.^®  In  most  experimental  systems  solitons  have  a  long  but  finite  lifetime. 
Therefore  we  investigate  the  creation  of  solitons  by  external  perturbations.  We 
assume  that  the  dynamics  of  the  experimental  system  can  modeled  by  a  sine-Gordon 
equation 

Uri  -  -sin(u)  =  F(ar,f)  (1) 

where  u(x,t)  is  the  field  amplitude  which  depends  on  space  x  and  time  t  and  where 
F  is  an  external  perturbation  which  only  depends  on  time  and  space.  In  order  to 
calculate  resonant  driving  forces,  we  integrate  according  to  Hiibier  and  Liischer** 
the  following  goal  dynamics 

Wrx  —  wu  —  B  sin(u)  -I-  u;t0(|x  —  50|  —  2.5)  =  0  (2) 

where  B  \s  a,  parameter  and  where  0  is  Heavisides  step  function.  We  take  circular 
or  fixed  boundaries  at  x  =  0  and  x  =  100.  The  simulation  is  finished  at  time  T  is 
when  |t/;(x,T')|  >  tt.  The  initial  conditions  are  w(x,0)  =  .0  and  u)(50, 0)  =  .001. 
The  driving  force  results  from 

F(x,0  = -wi(a:,0e(lx- 501-2.5)  (3) 

and  F(x,t)  =  0  for  <  >  T  .  The  basic  idea  is  that,  if  the  structure  of  Eqs.  (1) 
and  (2)  are  the  same,  i.e.,  B  =  l,u{x,t)  =  w(x,t)  is  a  special  solution  of  Eq. 
(1).  In  this  case  the  energy  transfer  P{i)  =  Fiidx  is  positive  for  all  t,  i.e., 
no  energy  is  reflected  since  F  is  proportional  to  w^.  Therefore  the  coefficient  of 
absorption  is  100%,  the  reaction  power  is  zero,  and  the  perturbation  is  resonant. 
The  special  space,  dependence  of  F  was  taken  in  order  to  create  solitons  instead  of 
other  nonlinear  waves.  Figure  1(a)  shows  the  result  of  a  numerical  simulation  of  the 
response  of  the  sine-Gordon  system.  For  the  integration  we  use  100  homogeneously 
distributed  break  points.  The  initial  amplitudes  of  u  at  these  break  points  are 
randomly  distributed  in  the  interval  [—10”^  10"^]  and  the  initial  velocity  is  set 
equal  zero.  Figure  1(a)  illustrates  that  nearly  all  the  transferred  energy  is  used  for 
the  creation  of  a  soli  ton- antisoliton  pair  since  there  are  no  additional  waves  in  the 
chain.  The  situation  is  completely  different  if  we  apply  a  sinusoidal  driving  force 
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of  the  same  magnitude  for  the  same  period  of  time  and  in  the  same  region  of  the 
chain.  In  this  case  no  solitons  are  created  (see  Figure  1(b)),  but  a  very  complicated 
dynamics  results  due  to  the  misfit  of  the  driving  frequency  and  the  eigen  frequency 
of  the  system  (Figure  2(a)).  This  example  illustrates  that  the  response  of  a  nonlinear 
system  is  usually  very  complicated  whereas  the  response  can  be  well  predictable 
and  simple  if  special  aperiodic  driving  forces  are  used,  since  u{x,t)  =  w(x,t)  and 
w{x,t)  can  be  calculated  in  advance  for  an  infinite  long  period  of  time. 


(a)  (b) 


0  -  100 

X 


FIGURE  1  The  field  amplitude  u  versus  x  after  an  aperiodic  optimal  stimulation  (a) 
and  after  a  sinusoidal  stimulation  (b). 


FIGURE  2  The  field  amplitude  u(50,  f)  versus  time  for  a  sinusoidal  perturbation  (a) 
and  the  ratio  between  the  reflected  and  the  absorbed  energy  versus  the  parameter  B 
of  the  model  (b). 
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NONLINEAR  RESONANCE  SPECTROSCOPY 

An  essential  condition  in  order  to  get  such  a  simple  response  is  to  have  a  correct 
model.  Otherwise,  u  differs  from  w  and  usually  the  dynamics  is  chaotic  and  an 
essential  part  of  the  energy  is  reflected.  Figure  2(b)  shows  the  ratio  R  between  the 
reflected  and  the  absorbed  energy  versus  B.  R  reaches  its  maximum  value  when  the 
parameters  of  the  model  and  the  parameters  of  the  goal  dynamics  coincide.  In  this 
case  the  response  is  simple  and  predictable  for  an  infinite  long  period  of  time,  while 
in  all  other  cases  including  periodic  perturbations,  a  very  complicated  response  was 
found.  By  a  systematic  search  for  the  minimum  of  the  reflected  energy  as  a  function 
of  the  parameters  of  the  model,  the  correct  magnitude  of  these  parameters  can  be 
determined. 


MORE  GENERAL  CONSIDERATIONS 

We  now  calculate  how  much  energy  provided  by  external  driving  force  feeds  into 
the  system.  The  total  energy^  of  the  system  is: 

^(0  =  ^  (y  +  Y +cos(u))  dz.  (4) 

The  absorption  energy  and  reflection  energy  of  the  system  is; 

Eahs^'^P9{P)dt  (5) 

^re/  =  (6) 

whereas  P  =  f  Fiidx  is  the  power  pumped  into  the  system.  The  efficiency  is; 

f  Fab s  Fref\ 

The  numerical  calculation  we  did  shows  that  Eref  =  0.0  and  0  —  0.975  which  is 
quite  close  to  one.  We  see  that  all  the  energy  pumped  in  is  absorbed  by  the  system. 
The  discrepancy  of  0  from  1  is  caused  by  some  errors  while  doing  the  calculation 
using  specific  numerical  methods  which  are  not  intrinsic  to  the  problem  we  are 

ooaujt  ilig. 

There  are  possible  difficulties  with  controlling  systems  represented  by  partial 
differential  equations  regardless  of  the  method.  One  major  drawback  of  controlling 
spatially  extended  systems  is  the  possible  complexity  of  the  driving  force.  Since  this 
force  is  spatially  dependent,  it  may  not  be  possible  to  apply  the  force  to  the  system 
once  it  is  calculated. 
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There  are  several  possible  solutions  worth  exploring.  The  first  is  a  simpli¬ 
fication  of  the  driving  term  by  spatial  Fourier  decomposition.  Control  or  at  least 
favorable  modification  of  the  system  might  be  achieved  by  applying  a  simplified 
f(x,t)  on  a  finite  set  of  points  in  the  domain.  In  the  event  that  it  is  not  possible 
to  spatially  modulate  the  driving  force,  a  suitable  /(<)  might  be  found  by  compar¬ 
ing  the  local  attractors  of  the  field  variable  at  various  points  in  the  domain.  Some 
systems  have  small  localized  regions  which  are  extremely  sensitive  to  external  per¬ 
turbations.  For  situations  where  these  perturbations  influence  a  major  portion  of 
the  system,  our  control  theory  holds  promise. 

Another  possible  problem  is  the  magnitude  of  the  driving  force.  Situations 
may  arise  where  the  energy  required  to  apply  F{x,t)  forbids  its  use.  The  size  of 
the  driving  force  is  directly  related  to  how  far  the  goal  and  model  dynamics  are 
separated  in  function  space.  Since  it  is  the  phase  information,  not  the  magnitude 
of  the  driving  term  that  is  important,  we  can  replace  F(x,t)  with  6F{x,t),  where 
6  <C  1,  in  the  system,  provided  boundary  correction  terms  are  not  employed.  The 
result  will  be  that  experimental  systems  will  entrain  to  the  goal  dynamics  more 
slowly. 

If  we  apply  a  force  locally  to  a  string,  the  curvature  of  the  string  is  proportional 
to  the  force.  Since  the  driving  force  we  now  use  io  d/,  we  would  expect  uvx  = 
where  a  is  a  constant  irr  the  middle  of  the  region  of  applied  force.  Indeed,  our 
numerical  calculation  shows  that  ir;xj,(50,  f)  «  — 0.2tr;t(.50,f).  Now  if  we  substitute 
this  relation  into  model  equation,  the  system  becomes: 

Utx  -  utt  -  sin(u)  =  Fi  (8) 

awt  —  Wit  —  sin(ii;)  =  F  (9) 

F]  =  au;,(2 -I- A)  (10) 

where  A  is  a  parameter  within  the  range  of  1.0  and  -1.0.  F  is  at  least  awt  in  order 
to  overcome  the  friction;  a  larger  force  will  drive  the  system  and  possibly  produce 
solitons. 

Now  we  can  see  that  the  model  equation  (or  goal  equation)  is  an  ordinary 
differential  equation,  so  we  can  numerically  integrate  this  equation  to  get  the  driving 
force.  We  apply  this  homogenous  driving  force  to  the  experimental  system  to  see  if 
we  can  drive  the  system  and  produce  the  solitons. 

Our  numerical  simulation  shows  that  if  we  use  initial  condition  u(x,0)  = 
0.,iu(x,0)  =  0.01,  w;(50, 0)  =  0.01,  the  whole  system  (except  boundary  points) 
is  moved  together  as  the  time  goes  on.  This  is  easy  to  understand  since  the  homo¬ 
geneous  driving  force  exerted  on  the  system  with  initial  uniform  distribution  will 
lead  the  whole  system  to  move  simultaneously.  There  are  apparently  no  solitons 
produced  (Figure  3). 

If  we  add  some  noise  to  the  experimental  system  (that  is,  to  take  into  account 
the  effect  of  temperature),  the  situation  is  different.  The  initial  distribution  of  the 
system  is  no  longer  uniform,  so  there  are  many  different  frequency  modes  in  the 
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FIGURE  3  The  whole  system  moves  up.  Here  F{x)  or  Fl(2r)  is  U{x). 


Soliton 


FIGURE  4  The  production  of  a  soliton  by  an  optimal  driving  force. 


system  compared  with  just  one  mode  in  the  uniform  system.  We  would  expect 
that  even  the  homogeneous  driving  force  will  excite  (or  resonate)  one  or  more 
specific  frequency  modes  of  the  nonuniform  system,  amplify  them,  and  therefore 
produce  solitons.  So  we  use  a  point- wise  driving  force  estimate  w,  and  then  we 
use  the  resulting  driving  force  as  a  homogeneous  driving  force.  This  might  sound 
inconsistent.  In  principle  we  need  to  apply  a  very  local  driving  force  in  order  to 
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stay  consistent,  however  if  the  noise  in  the  system  produces  local  maxima  which 
get  resonance  with  the  driving  force,  the  inter-reaction  is  local  even  if  the  driving 
force  is  homogeneous. 

The  numerical  simulation  results  are  shown  in  Figure  4.  We  find  that  with 
the  initial  distribution  10(1, 0)  =  0.1,u(x,0)  =  0.2(ran/()  —  .5),F  =  0.2ti;,Fi  = 
2.4F,  one  soliton  is  produced.  The  result  is  quite  sensitive  to  the  driving  force  we 
choose  and  noise  of  the  system.  Any  small  deviation  of  these  parameters  would 
not  lead  a  soliton  production.  We  only  see  one  soliton  produced  since  the  form  of 
our  variational  derived  force  F  is  a  special  solution  of  the  differential  equation,  not 
the  general  solution.  So  it  can  only  excite  certeiin  modes  of  the  nonlinear  system. 
The  sensitive  dependence  of  the  result  on  the  driving  force  can  be  understood 
since  a  larger  driving  force  will  move  the  whole  system  and  a  smaller  driving  force 
will  not  be  enough  to  create  a  soliton.  The  sensitive  dependence  of  the  result  on 
temperature  or  noise  can  be  understood  since,  for  lower  temperature  or  noise,  the 
degree  of  nonuniformity  is  small,  and  for  higher  temperature  the  noise  becomes  so 
large  that  it  overwhelms  the  system,  which  is  not  a  realistic  situation. 
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